2024-09-22 10:50:57,251 INFO [train.py:1266] (3/4) Training started 2024-09-22 10:50:57,251 INFO [train.py:1276] (3/4) Device: cuda:3 2024-09-22 10:50:57,254 INFO [train.py:1307] (3/4) Using dtype=torch.float16 2024-09-22 10:50:57,254 INFO [train.py:1308] (3/4) Use AMP=True 2024-09-22 10:50:57,254 INFO [train.py:1310] (3/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'ignore_id': -1, 'label_smoothing': 0.1, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '44a9d5682af9fd3ef77074777e15278ec6d390eb', 'k2-git-date': 'Wed Sep 27 11:22:55 2023', 'lhotse-version': '1.17.0.dev+git.ccfc5b2c.dirty', 'torch-version': '1.10.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'cr-ctc', 'icefall-git-sha1': 'a6eead6c-clean', 'icefall-git-date': 'Mon Sep 9 10:10:08 2024', 'icefall-path': '/star-zw/workspace/zipformer/icefall_cr_ctc', 'k2-path': '/star-zw/workspace/k2/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-zw/workspace/lhotse/lhotse/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-7-0905180047-6d6678bc6f-8cwvw', 'IP address': '10.30.5.48'}, 'world_size': 4, 'master_port': 12347, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 1.0, 'cr_loss_scale': 0.2, 'time_mask_ratio': 2.5, 'cr_loss_masked_scale': 1.0, 'attention_decoder_loss_scale': 0.8, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'use_bf16': False, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'attention_decoder_dim': 512, 'attention_decoder_num_layers': 6, 'attention_decoder_attention_dim': 512, 'attention_decoder_num_heads': 8, 'attention_decoder_feedforward_dim': 2048, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': False, 'use_ctc': True, 'use_attention_decoder': False, 'use_cr_ctc': True, 'full_libri': True, 'mini_libri': False, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 700, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'blank_id': 0, 'sos_id': 1, 'eos_id': 1, 'vocab_size': 500, 'dtype': torch.float16, 'use_autocast': True} 2024-09-22 10:50:57,255 INFO [train.py:1312] (3/4) About to create model 2024-09-22 10:50:57,900 INFO [train.py:1316] (3/4) Number of model parameters: 64250603 2024-09-22 10:50:57,900 INFO [train.py:752] (3/4) num_frame_masks: 25, max_frames_mask_fraction: 0.375 2024-09-22 10:51:02,836 INFO [train.py:1338] (3/4) Using DDP 2024-09-22 10:51:03,357 INFO [asr_datamodule.py:436] (3/4) About to get the shuffled train-clean-100, train-clean-360 and train-other-500 cuts 2024-09-22 10:51:03,601 INFO [asr_datamodule.py:232] (3/4) Enable MUSAN 2024-09-22 10:51:03,601 INFO [asr_datamodule.py:233] (3/4) About to get Musan cuts 2024-09-22 10:51:05,232 INFO [asr_datamodule.py:279] (3/4) Disable SpecAugment 2024-09-22 10:51:05,232 INFO [asr_datamodule.py:281] (3/4) About to create train dataset 2024-09-22 10:51:05,233 INFO [asr_datamodule.py:308] (3/4) Using DynamicBucketingSampler. 2024-09-22 10:51:28,642 INFO [asr_datamodule.py:325] (3/4) About to create train dataloader 2024-09-22 10:51:28,644 INFO [asr_datamodule.py:453] (3/4) About to get dev-clean cuts 2024-09-22 10:51:28,645 INFO [asr_datamodule.py:460] (3/4) About to get dev-other cuts 2024-09-22 10:51:28,646 INFO [asr_datamodule.py:356] (3/4) About to create dev dataset 2024-09-22 10:51:28,864 INFO [asr_datamodule.py:373] (3/4) About to create dev dataloader 2024-09-22 10:51:28,865 INFO [train.py:1545] (3/4) Sanity check -- see if any of the batches in epoch 1 would cause OOM. 2024-09-22 10:55:04,877 INFO [train.py:1576] (3/4) Maximum memory allocated so far is 18633MB 2024-09-22 10:55:06,894 INFO [train.py:1576] (3/4) Maximum memory allocated so far is 18633MB 2024-09-22 10:55:09,108 INFO [train.py:1576] (3/4) Maximum memory allocated so far is 18988MB 2024-09-22 10:55:10,930 INFO [train.py:1576] (3/4) Maximum memory allocated so far is 18988MB 2024-09-22 10:55:13,009 INFO [train.py:1576] (3/4) Maximum memory allocated so far is 18988MB 2024-09-22 10:55:15,404 INFO [train.py:1576] (3/4) Maximum memory allocated so far is 18988MB 2024-09-22 10:56:01,486 INFO [train.py:1198] (3/4) Epoch 1, batch 0, loss[loss=4.927, ctc_loss=4.796, cr_loss=0.6527, over 17045.00 frames. ], tot_loss[loss=4.927, ctc_loss=4.796, cr_loss=0.6527, over 17045.00 frames. ], batch size: 39, lr: 2.25e-02, grad_scale: 2.0 2024-09-22 10:56:01,487 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 10:56:18,231 INFO [train.py:1230] (3/4) Epoch 1, validation: loss=4.756, ctc_loss=4.756, cr_loss=2.853e-15, over 944034.00 frames. 2024-09-22 10:56:18,231 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 20366MB 2024-09-22 10:56:22,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=0.0, ans=0.05 2024-09-22 10:56:38,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.08 vs. limit=5.023333333333333 2024-09-22 10:56:39,777 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.680e+03 4.888e+03 5.131e+03 6.578e+03 8.849e+03, threshold=2.053e+04, percent-clipped=0.0 2024-09-22 10:56:40,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=46.666666666666664, ans=0.2995333333333333 2024-09-22 10:56:51,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=46.666666666666664, ans=0.247375 2024-09-22 10:57:00,473 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.227e+03 2.818e+03 4.846e+03 6.578e+03 1.124e+04, threshold=1.938e+04, percent-clipped=0.0 2024-09-22 10:57:15,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=23.07 vs. limit=7.5525 2024-09-22 10:57:17,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=7.5525 2024-09-22 10:57:22,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=140.0, ans=0.4934375 2024-09-22 10:57:25,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=4.056 2024-09-22 10:57:36,838 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.493e+02 2.434e+03 3.497e+03 4.961e+03 1.124e+04, threshold=1.399e+04, percent-clipped=0.0 2024-09-22 10:57:48,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=186.66666666666666, ans=0.49125 2024-09-22 10:57:51,515 INFO [train.py:1198] (3/4) Epoch 1, batch 50, loss[loss=1.27, ctc_loss=1.203, cr_loss=0.3319, over 17189.00 frames. ], tot_loss[loss=2.33, ctc_loss=2.269, cr_loss=0.3066, over 764177.61 frames. ], batch size: 41, lr: 2.48e-02, grad_scale: 0.5 2024-09-22 10:57:52,739 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=140.94 vs. limit=7.5875 2024-09-22 10:57:53,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=233.33333333333334, ans=0.20350000000000001 2024-09-22 10:58:12,916 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=136.90 vs. limit=7.605 2024-09-22 10:58:14,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.85 vs. limit=7.605 2024-09-22 10:58:15,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=280.0, ans=0.486875 2024-09-22 10:58:27,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=326.6666666666667, ans=0.29673333333333335 2024-09-22 10:58:27,916 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.72 vs. limit=7.745 2024-09-22 10:58:29,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=7.745 2024-09-22 10:58:30,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=326.6666666666667, ans=0.8885666666666667 2024-09-22 10:58:46,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=36.62 vs. limit=7.78 2024-09-22 10:58:47,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=373.3333333333333, ans=0.0916 2024-09-22 10:58:49,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=373.3333333333333, ans=0.4825 2024-09-22 10:58:57,100 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=28.26 vs. limit=7.64 2024-09-22 10:59:20,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=420.0, ans=0.0486875 2024-09-22 10:59:24,438 INFO [train.py:1198] (3/4) Epoch 1, batch 100, loss[loss=1.143, ctc_loss=1.112, cr_loss=0.153, over 17204.00 frames. ], tot_loss[loss=1.721, ctc_loss=1.672, cr_loss=0.2428, over 1343083.18 frames. ], batch size: 41, lr: 2.70e-02, grad_scale: 1.0 2024-09-22 10:59:28,076 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 5.797e+02 1.227e+03 2.964e+03 1.124e+04, threshold=2.454e+03, percent-clipped=0.0 2024-09-22 10:59:45,466 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=18.69 vs. limit=5.256666666666667 2024-09-22 10:59:49,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=3.077 2024-09-22 11:00:01,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=294.96 vs. limit=7.6925 2024-09-22 11:00:04,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=560.0, ans=0.47375 2024-09-22 11:00:09,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=560.0, ans=0.47375 2024-09-22 11:00:18,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=35.93 vs. limit=7.71 2024-09-22 11:00:20,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=86.15 vs. limit=7.71 2024-09-22 11:01:04,607 INFO [train.py:1198] (3/4) Epoch 1, batch 150, loss[loss=1.197, ctc_loss=1.172, cr_loss=0.1263, over 17061.00 frames. ], tot_loss[loss=1.508, ctc_loss=1.469, cr_loss=0.1955, over 1786213.38 frames. ], batch size: 46, lr: 2.93e-02, grad_scale: 1.0 2024-09-22 11:01:07,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=40.35 vs. limit=7.7625 2024-09-22 11:01:12,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=700.0, ans=0.17375000000000002 2024-09-22 11:01:14,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.18 vs. limit=8.025 2024-09-22 11:01:22,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=36.31 vs. limit=7.78 2024-09-22 11:01:24,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=4.298666666666667 2024-09-22 11:01:25,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.45 vs. limit=5.1866666666666665 2024-09-22 11:01:27,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=100.33 vs. limit=7.78 2024-09-22 11:01:36,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=746.6666666666666, ans=0.46499999999999997 2024-09-22 11:01:45,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=23.90 vs. limit=7.7975 2024-09-22 11:01:51,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=37.69 vs. limit=7.7975 2024-09-22 11:01:51,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=793.3333333333334, ans=7.7975 2024-09-22 11:01:53,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=793.3333333333334, ans=0.4628125 2024-09-22 11:02:15,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=50.72 vs. limit=7.815 2024-09-22 11:02:24,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=886.6666666666666, ans=0.8689666666666667 2024-09-22 11:02:31,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=8.165 2024-09-22 11:02:39,841 INFO [train.py:1198] (3/4) Epoch 1, batch 200, loss[loss=1.168, ctc_loss=1.148, cr_loss=0.09681, over 17288.00 frames. ], tot_loss[loss=1.398, ctc_loss=1.364, cr_loss=0.1676, over 2139963.33 frames. ], batch size: 46, lr: 3.15e-02, grad_scale: 2.0 2024-09-22 11:02:42,846 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=203.89 vs. limit=5.466666666666667 2024-09-22 11:02:43,594 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.311e+02 2.245e+02 3.033e+02 4.033e+02 1.104e+03, threshold=6.066e+02, percent-clipped=0.0 2024-09-22 11:02:50,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=7.85 2024-09-22 11:02:54,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.91 vs. limit=7.85 2024-09-22 11:02:59,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.20 vs. limit=8.235 2024-09-22 11:02:59,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.59 vs. limit=8.235 2024-09-22 11:03:02,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.26 vs. limit=4.392 2024-09-22 11:03:09,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=980.0, ans=0.4540625 2024-09-22 11:03:25,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1026.6666666666667, ans=7.885 2024-09-22 11:03:30,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1026.6666666666667, ans=0.451875 2024-09-22 11:03:34,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=95.77 vs. limit=7.9025 2024-09-22 11:03:41,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=22.92 vs. limit=7.9025 2024-09-22 11:03:48,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=47.84 vs. limit=7.9025 2024-09-22 11:03:49,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=7.9025 2024-09-22 11:03:54,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1120.0, ans=0.4475 2024-09-22 11:04:12,401 INFO [train.py:1198] (3/4) Epoch 1, batch 250, loss[loss=1.087, ctc_loss=1.061, cr_loss=0.1278, over 17260.00 frames. ], tot_loss[loss=1.334, ctc_loss=1.304, cr_loss=0.1521, over 2401496.01 frames. ], batch size: 42, lr: 3.38e-02, grad_scale: 2.0 2024-09-22 11:04:12,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1166.6666666666667, ans=0.4453125 2024-09-22 11:04:22,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=8.375 2024-09-22 11:04:26,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=22.84 vs. limit=7.9375 2024-09-22 11:04:29,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1213.3333333333333, ans=0.443125 2024-09-22 11:04:33,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=93.60 vs. limit=7.955 2024-09-22 11:04:59,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=7.9725 2024-09-22 11:05:12,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=85.76 vs. limit=7.99 2024-09-22 11:05:26,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=39.36 vs. limit=7.99 2024-09-22 11:05:38,935 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=28.30 vs. limit=5.676666666666667 2024-09-22 11:05:42,445 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.164e+00 2024-09-22 11:05:44,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1353.3333333333333, ans=0.4365625 2024-09-22 11:05:47,709 INFO [train.py:1198] (3/4) Epoch 1, batch 300, loss[loss=1.195, ctc_loss=1.163, cr_loss=0.1593, over 17008.00 frames. ], tot_loss[loss=1.299, ctc_loss=1.269, cr_loss=0.149, over 2612931.23 frames. ], batch size: 52, lr: 3.60e-02, grad_scale: 4.0 2024-09-22 11:05:51,295 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.872e+02 2.621e+02 3.440e+02 6.626e+02, threshold=5.242e+02, percent-clipped=4.0 2024-09-22 11:05:53,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.84 vs. limit=5.7 2024-09-22 11:05:55,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1400.0, ans=0.0685 2024-09-22 11:06:32,842 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=3.224 2024-09-22 11:06:34,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=60.10 vs. limit=8.06 2024-09-22 11:06:43,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.16 vs. limit=8.620000000000001 2024-09-22 11:06:45,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=8.620000000000001 2024-09-22 11:06:46,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1540.0, ans=0.7654 2024-09-22 11:06:56,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1540.0, ans=0.4278125 2024-09-22 11:07:09,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=8.69 2024-09-22 11:07:14,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.76 vs. limit=5.793333333333333 2024-09-22 11:07:19,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1586.6666666666667, ans=0.28413333333333335 2024-09-22 11:07:24,342 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=8.725 2024-09-22 11:07:25,017 INFO [train.py:1198] (3/4) Epoch 1, batch 350, loss[loss=1.268, ctc_loss=1.218, cr_loss=0.2528, over 16996.00 frames. ], tot_loss[loss=1.268, ctc_loss=1.236, cr_loss=0.1593, over 2774346.88 frames. ], batch size: 53, lr: 3.83e-02, grad_scale: 4.0 2024-09-22 11:07:45,608 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=8.76 2024-09-22 11:07:54,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1680.0, ans=0.42125 2024-09-22 11:07:59,156 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.36 vs. limit=5.42 2024-09-22 11:08:03,460 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.288e+01 2024-09-22 11:08:11,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=22.91 vs. limit=5.863333333333333 2024-09-22 11:08:16,893 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=21.28 vs. limit=8.1475 2024-09-22 11:08:18,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.12 vs. limit=5.431666666666667 2024-09-22 11:08:19,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1726.6666666666667, ans=0.28273333333333334 2024-09-22 11:08:27,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.10 vs. limit=5.443333333333333 2024-09-22 11:08:44,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=8.865 2024-09-22 11:09:00,247 INFO [train.py:1198] (3/4) Epoch 1, batch 400, loss[loss=1.103, ctc_loss=1.05, cr_loss=0.2617, over 17030.00 frames. ], tot_loss[loss=1.243, ctc_loss=1.207, cr_loss=0.1786, over 2903115.62 frames. ], batch size: 44, lr: 4.05e-02, grad_scale: 8.0 2024-09-22 11:09:03,793 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.305e+02 2.539e+02 3.204e+02 4.584e+02 1.114e+03, threshold=6.407e+02, percent-clipped=17.0 2024-09-22 11:09:04,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1866.6666666666667, ans=0.13 2024-09-22 11:09:04,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1866.6666666666667, ans=0.057999999999999996 2024-09-22 11:09:08,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1866.6666666666667, ans=0.04416666666666667 2024-09-22 11:09:24,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=8.2175 2024-09-22 11:09:25,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=8.2175 2024-09-22 11:09:42,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=18.85 vs. limit=8.235 2024-09-22 11:09:46,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1960.0, ans=0.255 2024-09-22 11:09:51,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1960.0, ans=0.408125 2024-09-22 11:09:57,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2006.6666666666667, ans=0.4059375 2024-09-22 11:09:59,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2006.6666666666667, ans=0.4059375 2024-09-22 11:10:05,539 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=119.47 vs. limit=8.2525 2024-09-22 11:10:11,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2053.3333333333335, ans=0.7705333333333333 2024-09-22 11:10:19,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.20 vs. limit=8.27 2024-09-22 11:10:29,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2100.0, ans=0.4015625 2024-09-22 11:10:29,649 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.396e+00 2024-09-22 11:10:31,079 INFO [train.py:1198] (3/4) Epoch 1, batch 450, loss[loss=1.121, ctc_loss=1.058, cr_loss=0.3134, over 17149.00 frames. ], tot_loss[loss=1.214, ctc_loss=1.173, cr_loss=0.2031, over 3006716.12 frames. ], batch size: 48, lr: 4.28e-02, grad_scale: 4.0 2024-09-22 11:10:53,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2146.6666666666665, ans=6.341666666666667 2024-09-22 11:11:11,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2193.3333333333335, ans=0.08629166666666667 2024-09-22 11:11:17,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=9.145 2024-09-22 11:11:30,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2240.0, ans=0.8216 2024-09-22 11:11:33,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2240.0, ans=0.2776 2024-09-22 11:12:01,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2286.6666666666665, ans=0.27713333333333334 2024-09-22 11:12:01,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2286.6666666666665, ans=0.11425 2024-09-22 11:12:10,286 INFO [train.py:1198] (3/4) Epoch 1, batch 500, loss[loss=1.09, ctc_loss=1.022, cr_loss=0.3372, over 17219.00 frames. ], tot_loss[loss=1.177, ctc_loss=1.131, cr_loss=0.2314, over 3089177.03 frames. ], batch size: 50, lr: 4.49e-02, grad_scale: 8.0 2024-09-22 11:12:13,046 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=4.933333333333334 2024-09-22 11:12:14,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=9.25 2024-09-22 11:12:15,705 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.330e+02 2.678e+02 3.368e+02 4.496e+02 8.489e+02, threshold=6.737e+02, percent-clipped=3.0 2024-09-22 11:12:37,987 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=9.285 2024-09-22 11:12:42,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2380.0, ans=0.2025 2024-09-22 11:12:53,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2426.6666666666665, ans=0.38625 2024-09-22 11:12:55,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2426.6666666666665, ans=0.38625 2024-09-22 11:13:01,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2426.6666666666665, ans=0.38625 2024-09-22 11:13:08,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=2473.3333333333335, ans=4.989333333333334 2024-09-22 11:13:20,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=9.355 2024-09-22 11:13:37,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2520.0, ans=0.2748 2024-09-22 11:13:40,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=8.445 2024-09-22 11:13:42,940 INFO [train.py:1198] (3/4) Epoch 1, batch 550, loss[loss=1.036, ctc_loss=0.9685, cr_loss=0.3375, over 11877.00 frames. ], tot_loss[loss=1.14, ctc_loss=1.088, cr_loss=0.2619, over 3136657.82 frames. ], batch size: 124, lr: 4.49e-02, grad_scale: 8.0 2024-09-22 11:13:54,265 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.03 vs. limit=6.283333333333333 2024-09-22 11:14:13,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2613.3333333333335, ans=0.035 2024-09-22 11:14:27,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2660.0, ans=0.8069000000000001 2024-09-22 11:14:57,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2753.3333333333335, ans=0.37093750000000003 2024-09-22 11:15:01,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=8.5325 2024-09-22 11:15:11,742 INFO [train.py:1198] (3/4) Epoch 1, batch 600, loss[loss=0.9934, ctc_loss=0.9046, cr_loss=0.4443, over 16549.00 frames. ], tot_loss[loss=1.101, ctc_loss=1.042, cr_loss=0.2941, over 3181686.08 frames. ], batch size: 66, lr: 4.49e-02, grad_scale: 8.0 2024-09-22 11:15:12,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2800.0, ans=0.24200000000000002 2024-09-22 11:15:17,227 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.480e+02 2.420e+02 3.324e+02 4.180e+02 8.567e+02, threshold=6.647e+02, percent-clipped=1.0 2024-09-22 11:15:28,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2846.6666666666665, ans=0.3665625 2024-09-22 11:15:42,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2846.6666666666665, ans=0.08220833333333334 2024-09-22 11:15:59,387 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=8.585 2024-09-22 11:16:13,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.94 vs. limit=5.735 2024-09-22 11:16:16,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2940.0, ans=0.3621875 2024-09-22 11:16:42,203 INFO [train.py:1198] (3/4) Epoch 1, batch 650, loss[loss=0.8032, ctc_loss=0.7349, cr_loss=0.3417, over 16967.00 frames. ], tot_loss[loss=1.053, ctc_loss=0.9879, cr_loss=0.3245, over 3222207.88 frames. ], batch size: 42, lr: 4.49e-02, grad_scale: 8.0 2024-09-22 11:17:01,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.20 vs. limit=5.77 2024-09-22 11:17:07,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.03 vs. limit=5.77 2024-09-22 11:17:27,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=9.845 2024-09-22 11:17:28,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3126.6666666666665, ans=0.08274999999999999 2024-09-22 11:17:31,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3126.6666666666665, ans=0.08274999999999999 2024-09-22 11:17:36,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3126.6666666666665, ans=0.35343749999999996 2024-09-22 11:17:47,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3173.3333333333335, ans=0.08099999999999999 2024-09-22 11:17:52,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3173.3333333333335, ans=0.35125 2024-09-22 11:18:01,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3220.0, ans=0.2678 2024-09-22 11:18:14,948 INFO [train.py:1198] (3/4) Epoch 1, batch 700, loss[loss=0.7557, ctc_loss=0.6721, cr_loss=0.418, over 17265.00 frames. ], tot_loss[loss=1.002, ctc_loss=0.932, cr_loss=0.3522, over 3257011.08 frames. ], batch size: 42, lr: 4.49e-02, grad_scale: 8.0 2024-09-22 11:18:15,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3266.6666666666665, ans=0.09166666666666667 2024-09-22 11:18:20,265 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.450e+02 2.357e+02 3.031e+02 4.358e+02 1.002e+03, threshold=6.062e+02, percent-clipped=7.0 2024-09-22 11:18:29,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=8.725 2024-09-22 11:18:30,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3313.3333333333335, ans=0.7840333333333334 2024-09-22 11:18:44,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3313.3333333333335, ans=0.07929166666666668 2024-09-22 11:18:53,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=8.76 2024-09-22 11:19:15,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3406.6666666666665, ans=0.3403125 2024-09-22 11:19:17,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3406.6666666666665, ans=0.3403125 2024-09-22 11:19:22,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3406.6666666666665, ans=0.09899494936611666 2024-09-22 11:19:34,770 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 11:19:35,117 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=10.09 2024-09-22 11:19:44,684 INFO [train.py:1198] (3/4) Epoch 1, batch 750, loss[loss=0.8176, ctc_loss=0.7259, cr_loss=0.4584, over 17216.00 frames. ], tot_loss[loss=0.9548, ctc_loss=0.8797, cr_loss=0.3754, over 3273568.86 frames. ], batch size: 55, lr: 4.49e-02, grad_scale: 8.0 2024-09-22 11:20:09,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3546.6666666666665, ans=0.020199999999999996 2024-09-22 11:20:45,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3640.0, ans=0.018100000000000005 2024-09-22 11:21:04,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3686.6666666666665, ans=0.32718749999999996 2024-09-22 11:21:13,171 INFO [train.py:1198] (3/4) Epoch 1, batch 800, loss[loss=0.6462, ctc_loss=0.5751, cr_loss=0.3557, over 16963.00 frames. ], tot_loss[loss=0.9059, ctc_loss=0.8277, cr_loss=0.3912, over 3286535.30 frames. ], batch size: 42, lr: 4.49e-02, grad_scale: 16.0 2024-09-22 11:21:18,257 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.614e+02 4.071e+02 6.304e+02 1.473e+03, threshold=8.142e+02, percent-clipped=26.0 2024-09-22 11:21:18,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3733.3333333333335, ans=0.033333333333333326 2024-09-22 11:21:20,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3733.3333333333335, ans=0.26266666666666666 2024-09-22 11:21:22,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3733.3333333333335, ans=7.333333333333334 2024-09-22 11:21:34,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3780.0, ans=0.2622 2024-09-22 11:21:58,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3826.6666666666665, ans=0.021666666666666667 2024-09-22 11:22:22,483 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=8.9525 2024-09-22 11:22:32,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3920.0, ans=0.05299999999999999 2024-09-22 11:22:44,174 INFO [train.py:1198] (3/4) Epoch 1, batch 850, loss[loss=0.6014, ctc_loss=0.5267, cr_loss=0.3731, over 17290.00 frames. ], tot_loss[loss=0.8568, ctc_loss=0.7772, cr_loss=0.3977, over 3313351.82 frames. ], batch size: 44, lr: 4.49e-02, grad_scale: 16.0 2024-09-22 11:22:58,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3966.6666666666665, ans=0.3140625 2024-09-22 11:23:00,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.16 vs. limit=6.003333333333334 2024-09-22 11:23:06,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4013.3333333333335, ans=0.049944444444444444 2024-09-22 11:23:25,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4060.0, ans=0.04975 2024-09-22 11:23:28,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4060.0, ans=0.3096875 2024-09-22 11:23:40,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4106.666666666667, ans=0.009976811594202899 2024-09-22 11:24:05,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4153.333333333333, ans=0.3053125 2024-09-22 11:24:11,786 INFO [train.py:1198] (3/4) Epoch 1, batch 900, loss[loss=0.6381, ctc_loss=0.5556, cr_loss=0.4126, over 16723.00 frames. ], tot_loss[loss=0.811, ctc_loss=0.7308, cr_loss=0.4013, over 3325463.80 frames. ], batch size: 61, lr: 4.48e-02, grad_scale: 16.0 2024-09-22 11:24:16,914 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.836e+02 3.761e+02 6.423e+02 1.326e+03, threshold=7.521e+02, percent-clipped=10.0 2024-09-22 11:24:56,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4293.333333333333, ans=0.29874999999999996 2024-09-22 11:24:59,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.49 vs. limit=7.1466666666666665 2024-09-22 11:25:13,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4340.0, ans=0.00992608695652174 2024-09-22 11:25:16,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=10.754999999999999 2024-09-22 11:25:29,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4386.666666666667, ans=0.26580000000000004 2024-09-22 11:25:30,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4386.666666666667, ans=0.294375 2024-09-22 11:25:36,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4433.333333333333, ans=0.009905797101449275 2024-09-22 11:25:37,369 INFO [train.py:1198] (3/4) Epoch 1, batch 950, loss[loss=0.5436, ctc_loss=0.4684, cr_loss=0.3761, over 17267.00 frames. ], tot_loss[loss=0.7679, ctc_loss=0.6871, cr_loss=0.4042, over 3341066.45 frames. ], batch size: 44, lr: 4.48e-02, grad_scale: 16.0 2024-09-22 11:25:37,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4433.333333333333, ans=0.2921875 2024-09-22 11:25:47,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4433.333333333333, ans=0.2921875 2024-09-22 11:25:49,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4433.333333333333, ans=0.09899494936611666 2024-09-22 11:26:01,283 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 11:26:17,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.14 vs. limit=6.131666666666667 2024-09-22 11:26:53,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4620.0, ans=0.009865217391304347 2024-09-22 11:27:05,078 INFO [train.py:1198] (3/4) Epoch 1, batch 1000, loss[loss=0.5323, ctc_loss=0.4524, cr_loss=0.3992, over 17203.00 frames. ], tot_loss[loss=0.7315, ctc_loss=0.6497, cr_loss=0.4087, over 3346797.94 frames. ], batch size: 41, lr: 4.48e-02, grad_scale: 16.0 2024-09-22 11:27:09,935 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+02 2.795e+02 3.933e+02 5.449e+02 1.373e+03, threshold=7.866e+02, percent-clipped=13.0 2024-09-22 11:27:48,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=11.07 2024-09-22 11:28:01,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.03 vs. limit=7.403333333333334 2024-09-22 11:28:04,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4806.666666666667, ans=0.7317666666666667 2024-09-22 11:28:30,027 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=11.14 2024-09-22 11:28:34,217 INFO [train.py:1198] (3/4) Epoch 1, batch 1050, loss[loss=0.7065, ctc_loss=0.6153, cr_loss=0.4563, over 12372.00 frames. ], tot_loss[loss=0.6983, ctc_loss=0.6156, cr_loss=0.4135, over 3358789.95 frames. ], batch size: 123, lr: 4.48e-02, grad_scale: 16.0 2024-09-22 11:28:49,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4946.666666666667, ans=0.268125 2024-09-22 11:28:56,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4946.666666666667, ans=0.04605555555555556 2024-09-22 11:29:05,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4946.666666666667, ans=0.04949747468305833 2024-09-22 11:29:07,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4946.666666666667, ans=0.268125 2024-09-22 11:29:46,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5086.666666666667, ans=0.26156250000000003 2024-09-22 11:30:01,212 INFO [train.py:1198] (3/4) Epoch 1, batch 1100, loss[loss=0.5143, ctc_loss=0.4281, cr_loss=0.4309, over 16237.00 frames. ], tot_loss[loss=0.6691, ctc_loss=0.5855, cr_loss=0.4179, over 3363026.82 frames. ], batch size: 36, lr: 4.48e-02, grad_scale: 16.0 2024-09-22 11:30:01,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=11.35 2024-09-22 11:30:06,284 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+02 2.354e+02 3.241e+02 4.881e+02 1.077e+03, threshold=6.482e+02, percent-clipped=7.0 2024-09-22 11:30:25,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5180.0, ans=0.2571875 2024-09-22 11:30:25,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5180.0, ans=0.8018 2024-09-22 11:30:31,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5180.0, ans=0.7187 2024-09-22 11:30:42,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5226.666666666667, ans=0.255 2024-09-22 11:30:59,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.41 vs. limit=9.4775 2024-09-22 11:31:02,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5273.333333333333, ans=0.7154333333333334 2024-09-22 11:31:11,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=9.495000000000001 2024-09-22 11:31:22,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=5320.0, ans=0.2 2024-09-22 11:31:25,821 INFO [train.py:1198] (3/4) Epoch 1, batch 1150, loss[loss=0.5241, ctc_loss=0.4432, cr_loss=0.4045, over 17091.00 frames. ], tot_loss[loss=0.6423, ctc_loss=0.5581, cr_loss=0.4211, over 3368103.26 frames. ], batch size: 49, lr: 4.47e-02, grad_scale: 16.0 2024-09-22 11:31:29,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=5366.666666666667, ans=0.03322916666666667 2024-09-22 11:31:43,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5413.333333333333, ans=0.24586666666666668 2024-09-22 11:31:52,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5413.333333333333, ans=0.24586666666666668 2024-09-22 11:31:55,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=11.559999999999999 2024-09-22 11:32:04,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5460.0, ans=0.04391666666666667 2024-09-22 11:32:05,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5460.0, ans=0.04949747468305833 2024-09-22 11:32:34,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5553.333333333333, ans=0.24446666666666667 2024-09-22 11:32:44,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=9.5825 2024-09-22 11:32:58,124 INFO [train.py:1198] (3/4) Epoch 1, batch 1200, loss[loss=0.6873, ctc_loss=0.599, cr_loss=0.4414, over 11142.00 frames. ], tot_loss[loss=0.6184, ctc_loss=0.5337, cr_loss=0.4232, over 3366143.84 frames. ], batch size: 123, lr: 4.47e-02, grad_scale: 32.0 2024-09-22 11:33:02,961 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+02 2.423e+02 3.283e+02 4.569e+02 8.108e+02, threshold=6.566e+02, percent-clipped=6.0 2024-09-22 11:33:03,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5600.0, ans=0.0 2024-09-22 11:33:39,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5693.333333333333, ans=0.009631884057971015 2024-09-22 11:34:03,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5786.666666666667, ans=0.22875 2024-09-22 11:34:06,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=5786.666666666667, ans=8.616666666666667 2024-09-22 11:34:10,530 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.02 vs. limit=9.67 2024-09-22 11:34:23,533 INFO [train.py:1198] (3/4) Epoch 1, batch 1250, loss[loss=0.4594, ctc_loss=0.3827, cr_loss=0.3837, over 17047.00 frames. ], tot_loss[loss=0.5979, ctc_loss=0.5126, cr_loss=0.426, over 3372727.89 frames. ], batch size: 39, lr: 4.47e-02, grad_scale: 32.0 2024-09-22 11:34:25,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5833.333333333333, ans=0.04236111111111111 2024-09-22 11:34:42,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5880.0, ans=0.04216666666666667 2024-09-22 11:34:58,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.41 vs. limit=6.4816666666666665 2024-09-22 11:35:02,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5926.666666666667, ans=0.22218749999999998 2024-09-22 11:35:29,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=6020.0, ans=0.1898 2024-09-22 11:35:47,099 INFO [train.py:1198] (3/4) Epoch 1, batch 1300, loss[loss=0.5729, ctc_loss=0.474, cr_loss=0.4948, over 16523.00 frames. ], tot_loss[loss=0.5806, ctc_loss=0.4949, cr_loss=0.4282, over 3370800.83 frames. ], batch size: 66, lr: 4.47e-02, grad_scale: 32.0 2024-09-22 11:35:52,195 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.553e+02 2.150e+02 2.604e+02 3.500e+02 8.408e+02, threshold=5.208e+02, percent-clipped=5.0 2024-09-22 11:36:00,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.37 vs. limit=8.033333333333333 2024-09-22 11:36:44,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=6206.666666666667, ans=0.20906249999999998 2024-09-22 11:36:46,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=6206.666666666667, ans=0.0 2024-09-22 11:36:51,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6206.666666666667, ans=0.23793333333333333 2024-09-22 11:36:51,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=6206.666666666667, ans=0.23793333333333333 2024-09-22 11:36:58,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=9.845 2024-09-22 11:37:01,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=6253.333333333333, ans=0.20687499999999998 2024-09-22 11:37:12,780 INFO [train.py:1198] (3/4) Epoch 1, batch 1350, loss[loss=0.5351, ctc_loss=0.4485, cr_loss=0.4329, over 17311.00 frames. ], tot_loss[loss=0.5659, ctc_loss=0.4796, cr_loss=0.4311, over 3368060.66 frames. ], batch size: 51, lr: 4.46e-02, grad_scale: 32.0 2024-09-22 11:38:03,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=12.295 2024-09-22 11:38:41,184 INFO [train.py:1198] (3/4) Epoch 1, batch 1400, loss[loss=0.609, ctc_loss=0.5195, cr_loss=0.4474, over 11492.00 frames. ], tot_loss[loss=0.5517, ctc_loss=0.4654, cr_loss=0.4314, over 3355478.41 frames. ], batch size: 123, lr: 4.46e-02, grad_scale: 32.0 2024-09-22 11:38:46,073 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.453e+02 3.278e+02 5.014e+02 1.044e+03, threshold=6.556e+02, percent-clipped=21.0 2024-09-22 11:39:21,317 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=9.985 2024-09-22 11:39:54,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.23 vs. limit=5.344 2024-09-22 11:39:58,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=6720.0, ans=0.185 2024-09-22 11:40:05,930 INFO [train.py:1198] (3/4) Epoch 1, batch 1450, loss[loss=0.5072, ctc_loss=0.4096, cr_loss=0.4881, over 16904.00 frames. ], tot_loss[loss=0.5404, ctc_loss=0.4537, cr_loss=0.4337, over 3364181.96 frames. ], batch size: 58, lr: 4.46e-02, grad_scale: 32.0 2024-09-22 11:40:19,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=10.0375 2024-09-22 11:40:40,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=6860.0, ans=0.17843750000000003 2024-09-22 11:41:08,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=6906.666666666667, ans=0.05683333333333333 2024-09-22 11:41:13,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=6.7813333333333325 2024-09-22 11:41:15,399 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 11:41:18,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=6953.333333333333, ans=9.345833333333333 2024-09-22 11:41:27,762 INFO [train.py:1198] (3/4) Epoch 1, batch 1500, loss[loss=0.5426, ctc_loss=0.4347, cr_loss=0.5391, over 15136.00 frames. ], tot_loss[loss=0.5293, ctc_loss=0.4422, cr_loss=0.4354, over 3368709.45 frames. ], batch size: 89, lr: 4.46e-02, grad_scale: 32.0 2024-09-22 11:41:28,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=7000.0, ans=0.037500000000000006 2024-09-22 11:41:32,711 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.678e+02 2.217e+02 3.113e+02 4.719e+02 9.117e+02, threshold=6.226e+02, percent-clipped=8.0 2024-09-22 11:41:34,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=7000.0, ans=0.171875 2024-09-22 11:41:35,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=4.05 2024-09-22 11:41:41,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=7000.0, ans=0.009347826086956522 2024-09-22 11:41:46,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=7046.666666666667, ans=0.00933768115942029 2024-09-22 11:42:04,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=12.82 2024-09-22 11:42:21,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=10.1775 2024-09-22 11:42:56,584 INFO [train.py:1198] (3/4) Epoch 1, batch 1550, loss[loss=0.4819, ctc_loss=0.3952, cr_loss=0.4331, over 17201.00 frames. ], tot_loss[loss=0.5194, ctc_loss=0.4322, cr_loss=0.436, over 3367633.87 frames. ], batch size: 55, lr: 4.45e-02, grad_scale: 32.0 2024-09-22 11:43:24,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=7280.0, ans=0.30920000000000003 2024-09-22 11:43:34,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=7326.666666666667, ans=0.1565625 2024-09-22 11:43:49,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=7373.333333333333, ans=0.6419333333333334 2024-09-22 11:43:52,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=7373.333333333333, ans=0.15437499999999998 2024-09-22 11:44:08,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=7420.0, ans=0.15218749999999998 2024-09-22 11:44:18,844 INFO [train.py:1198] (3/4) Epoch 1, batch 1600, loss[loss=0.4275, ctc_loss=0.3422, cr_loss=0.4264, over 16978.00 frames. ], tot_loss[loss=0.5096, ctc_loss=0.4228, cr_loss=0.4342, over 3354311.54 frames. ], batch size: 42, lr: 4.45e-02, grad_scale: 32.0 2024-09-22 11:44:23,628 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.570e+02 2.041e+02 2.519e+02 3.425e+02 6.490e+02, threshold=5.038e+02, percent-clipped=3.0 2024-09-22 11:44:28,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=7466.666666666667, ans=0.035555555555555556 2024-09-22 11:44:45,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=7513.333333333333, ans=0.025 2024-09-22 11:45:06,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=7560.0, ans=0.035166666666666666 2024-09-22 11:45:10,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=7606.666666666667, ans=0.025 2024-09-22 11:45:22,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=7606.666666666667, ans=0.22393333333333332 2024-09-22 11:45:34,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=7653.333333333333, ans=0.14125 2024-09-22 11:45:36,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=7653.333333333333, ans=0.14125 2024-09-22 11:45:42,060 INFO [train.py:1198] (3/4) Epoch 1, batch 1650, loss[loss=0.4534, ctc_loss=0.367, cr_loss=0.432, over 16946.00 frames. ], tot_loss[loss=0.5005, ctc_loss=0.4136, cr_loss=0.4345, over 3346272.13 frames. ], batch size: 42, lr: 4.45e-02, grad_scale: 32.0 2024-09-22 11:45:48,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=7700.0, ans=0.009195652173913044 2024-09-22 11:45:57,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=7746.666666666667, ans=0.009185507246376812 2024-09-22 11:46:11,011 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=10.405 2024-09-22 11:46:26,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=7793.333333333333, ans=0.00917536231884058 2024-09-22 11:46:26,911 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=10.4225 2024-09-22 11:46:44,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=10.44 2024-09-22 11:46:50,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=7886.666666666667, ans=0.1303125 2024-09-22 11:47:05,462 INFO [train.py:1198] (3/4) Epoch 1, batch 1700, loss[loss=0.4914, ctc_loss=0.3983, cr_loss=0.4656, over 17246.00 frames. ], tot_loss[loss=0.4941, ctc_loss=0.407, cr_loss=0.4356, over 3351965.12 frames. ], batch size: 55, lr: 4.44e-02, grad_scale: 32.0 2024-09-22 11:47:10,186 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.533e+02 2.014e+02 2.718e+02 3.727e+02 5.677e+02, threshold=5.436e+02, percent-clipped=4.0 2024-09-22 11:47:18,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=7933.333333333333, ans=0.035 2024-09-22 11:47:23,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=7980.0, ans=0.025 2024-09-22 11:47:34,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=7980.0, ans=0.12593749999999998 2024-09-22 11:47:36,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=8026.666666666667, ans=0.03322222222222222 2024-09-22 11:47:48,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=13.52 2024-09-22 11:48:10,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.02 vs. limit=7.229333333333333 2024-09-22 11:48:11,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=8073.333333333333, ans=0.125 2024-09-22 11:48:30,597 INFO [train.py:1198] (3/4) Epoch 1, batch 1750, loss[loss=0.4761, ctc_loss=0.379, cr_loss=0.4854, over 17277.00 frames. ], tot_loss[loss=0.4867, ctc_loss=0.3994, cr_loss=0.4366, over 3357151.81 frames. ], batch size: 46, lr: 4.44e-02, grad_scale: 32.0 2024-09-22 11:48:56,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.39 vs. limit=10.58 2024-09-22 11:49:12,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=8260.0, ans=0.125 2024-09-22 11:49:14,739 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=7.304 2024-09-22 11:49:23,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=8306.666666666666, ans=0.6092666666666667 2024-09-22 11:49:24,223 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=13.73 2024-09-22 11:49:44,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=8353.333333333334, ans=0.32530000000000003 2024-09-22 11:49:53,986 INFO [train.py:1198] (3/4) Epoch 1, batch 1800, loss[loss=0.42, ctc_loss=0.3345, cr_loss=0.4274, over 17141.00 frames. ], tot_loss[loss=0.4803, ctc_loss=0.3929, cr_loss=0.4371, over 3355829.10 frames. ], batch size: 40, lr: 4.44e-02, grad_scale: 32.0 2024-09-22 11:49:58,879 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.382e+02 1.919e+02 2.429e+02 3.208e+02 6.110e+02, threshold=4.858e+02, percent-clipped=5.0 2024-09-22 11:50:16,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.24 vs. limit=9.223333333333333 2024-09-22 11:50:25,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=8493.333333333334, ans=0.125 2024-09-22 11:50:40,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=8540.0, ans=0.031083333333333338 2024-09-22 11:50:41,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=10.7025 2024-09-22 11:50:42,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=8540.0, ans=0.6011 2024-09-22 11:51:00,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=8586.666666666666, ans=0.009002898550724638 2024-09-22 11:51:14,593 INFO [train.py:1198] (3/4) Epoch 1, batch 1850, loss[loss=0.5038, ctc_loss=0.4157, cr_loss=0.4407, over 15979.00 frames. ], tot_loss[loss=0.4756, ctc_loss=0.388, cr_loss=0.4378, over 3356599.58 frames. ], batch size: 74, lr: 4.43e-02, grad_scale: 32.0 2024-09-22 11:51:49,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8726.666666666666, ans=0.21273333333333333 2024-09-22 11:52:09,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.33 vs. limit=9.386666666666667 2024-09-22 11:52:29,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=14.115 2024-09-22 11:52:40,378 INFO [train.py:1198] (3/4) Epoch 1, batch 1900, loss[loss=0.4432, ctc_loss=0.36, cr_loss=0.416, over 17043.00 frames. ], tot_loss[loss=0.4704, ctc_loss=0.3828, cr_loss=0.4377, over 3360172.30 frames. ], batch size: 52, lr: 4.43e-02, grad_scale: 32.0 2024-09-22 11:52:44,917 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.532e+02 1.950e+02 2.731e+02 3.550e+02 1.054e+03, threshold=5.462e+02, percent-clipped=8.0 2024-09-22 11:52:45,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=4.33 2024-09-22 11:52:46,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=8866.666666666666, ans=0.5896666666666668 2024-09-22 11:53:10,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=14.184999999999999 2024-09-22 11:53:24,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=8960.0, ans=0.125 2024-09-22 11:53:39,530 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=14.254999999999999 2024-09-22 11:54:00,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=9053.333333333334, ans=0.028944444444444443 2024-09-22 11:54:02,944 INFO [train.py:1198] (3/4) Epoch 1, batch 1950, loss[loss=0.4679, ctc_loss=0.3827, cr_loss=0.4259, over 17130.00 frames. ], tot_loss[loss=0.4642, ctc_loss=0.3765, cr_loss=0.4382, over 3367650.96 frames. ], batch size: 48, lr: 4.43e-02, grad_scale: 32.0 2024-09-22 11:54:20,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=9146.666666666666, ans=0.8414666666666666 2024-09-22 11:54:51,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.85 vs. limit=14.395 2024-09-22 11:55:08,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=10.9825 2024-09-22 11:55:24,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=9333.333333333334, ans=0.125 2024-09-22 11:55:26,421 INFO [train.py:1198] (3/4) Epoch 1, batch 2000, loss[loss=0.4584, ctc_loss=0.3729, cr_loss=0.4278, over 17101.00 frames. ], tot_loss[loss=0.4607, ctc_loss=0.3731, cr_loss=0.4377, over 3364823.00 frames. ], batch size: 49, lr: 4.42e-02, grad_scale: 32.0 2024-09-22 11:55:31,236 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.560e+02 1.918e+02 2.442e+02 3.354e+02 7.763e+02, threshold=4.883e+02, percent-clipped=5.0 2024-09-22 11:55:54,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=9380.0, ans=0.5717000000000001 2024-09-22 11:56:07,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=9426.666666666666, ans=10.0 2024-09-22 11:56:13,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9473.333333333334, ans=0.20526666666666665 2024-09-22 11:56:26,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9473.333333333334, ans=0.20526666666666665 2024-09-22 11:56:49,512 INFO [train.py:1198] (3/4) Epoch 1, batch 2050, loss[loss=0.3998, ctc_loss=0.3205, cr_loss=0.3965, over 16766.00 frames. ], tot_loss[loss=0.4561, ctc_loss=0.3686, cr_loss=0.4374, over 3361575.44 frames. ], batch size: 37, lr: 4.42e-02, grad_scale: 32.0 2024-09-22 11:56:59,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=11.0875 2024-09-22 11:57:01,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=9566.666666666666, ans=11.0875 2024-09-22 11:57:15,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=9613.333333333334, ans=0.125 2024-09-22 11:57:55,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=9706.666666666666, ans=0.026222222222222227 2024-09-22 11:57:59,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=11.1575 2024-09-22 11:58:14,745 INFO [train.py:1198] (3/4) Epoch 1, batch 2100, loss[loss=0.4538, ctc_loss=0.3657, cr_loss=0.4407, over 17298.00 frames. ], tot_loss[loss=0.4522, ctc_loss=0.3646, cr_loss=0.438, over 3357773.90 frames. ], batch size: 49, lr: 4.42e-02, grad_scale: 32.0 2024-09-22 11:58:15,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=9800.0, ans=0.5569999999999999 2024-09-22 11:58:19,529 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.503e+02 1.883e+02 2.281e+02 3.077e+02 7.464e+02, threshold=4.562e+02, percent-clipped=6.0 2024-09-22 11:58:25,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=7.92 2024-09-22 11:59:14,950 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=7.976 2024-09-22 11:59:19,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=9986.666666666666, ans=0.125 2024-09-22 11:59:28,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.17 vs. limit=14.99 2024-09-22 11:59:37,507 INFO [train.py:1198] (3/4) Epoch 1, batch 2150, loss[loss=0.4383, ctc_loss=0.3467, cr_loss=0.458, over 17231.00 frames. ], tot_loss[loss=0.4513, ctc_loss=0.3634, cr_loss=0.4395, over 3354339.18 frames. ], batch size: 55, lr: 4.41e-02, grad_scale: 32.0 2024-09-22 12:00:02,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=15.059999999999999 2024-09-22 12:00:16,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=10126.666666666666, ans=0.125 2024-09-22 12:00:17,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=10126.666666666666, ans=0.125 2024-09-22 12:00:28,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=10173.333333333334, ans=0.19826666666666665 2024-09-22 12:00:30,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=10173.333333333334, ans=0.09899494936611666 2024-09-22 12:00:33,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=10173.333333333334, ans=10.0 2024-09-22 12:00:40,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=10220.0, ans=0.024083333333333335 2024-09-22 12:00:58,123 INFO [train.py:1198] (3/4) Epoch 1, batch 2200, loss[loss=0.4058, ctc_loss=0.3245, cr_loss=0.4067, over 17148.00 frames. ], tot_loss[loss=0.4469, ctc_loss=0.3592, cr_loss=0.4388, over 3349777.77 frames. ], batch size: 45, lr: 4.41e-02, grad_scale: 32.0 2024-09-22 12:01:02,863 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+02 2.024e+02 2.529e+02 3.777e+02 5.736e+02, threshold=5.059e+02, percent-clipped=14.0 2024-09-22 12:01:04,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=10266.666666666666, ans=0.0 2024-09-22 12:01:18,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=10313.333333333334, ans=0.5390333333333334 2024-09-22 12:01:38,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=10360.0, ans=0.5374000000000001 2024-09-22 12:02:21,447 INFO [train.py:1198] (3/4) Epoch 1, batch 2250, loss[loss=0.4436, ctc_loss=0.3574, cr_loss=0.4312, over 17301.00 frames. ], tot_loss[loss=0.4429, ctc_loss=0.3553, cr_loss=0.4377, over 3351569.16 frames. ], batch size: 49, lr: 4.40e-02, grad_scale: 32.0 2024-09-22 12:02:27,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=10500.0, ans=0.025 2024-09-22 12:02:31,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=11.4375 2024-09-22 12:02:47,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=10546.666666666666, ans=0.125 2024-09-22 12:02:52,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=10546.666666666666, ans=0.125 2024-09-22 12:02:53,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=10546.666666666666, ans=0.008576811594202899 2024-09-22 12:02:54,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.59 vs. limit=10.273333333333333 2024-09-22 12:03:07,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=10593.333333333334, ans=0.022527777777777775 2024-09-22 12:03:46,778 INFO [train.py:1198] (3/4) Epoch 1, batch 2300, loss[loss=0.3624, ctc_loss=0.2825, cr_loss=0.3992, over 17078.00 frames. ], tot_loss[loss=0.4397, ctc_loss=0.3522, cr_loss=0.4374, over 3342370.14 frames. ], batch size: 40, lr: 4.40e-02, grad_scale: 32.0 2024-09-22 12:03:51,580 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.433e+02 1.850e+02 2.386e+02 2.971e+02 5.038e+02, threshold=4.772e+02, percent-clipped=0.0 2024-09-22 12:04:03,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=10780.0, ans=0.5227 2024-09-22 12:04:11,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=10780.0, ans=0.125 2024-09-22 12:04:19,845 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=8.330666666666666 2024-09-22 12:04:22,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=10826.666666666666, ans=0.125 2024-09-22 12:04:48,440 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=8.349333333333334 2024-09-22 12:04:50,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=10873.333333333334, ans=10.0 2024-09-22 12:04:54,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=10920.0, ans=0.025 2024-09-22 12:05:03,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=10920.0, ans=0.19079999999999997 2024-09-22 12:05:09,820 INFO [train.py:1198] (3/4) Epoch 1, batch 2350, loss[loss=0.4217, ctc_loss=0.3279, cr_loss=0.469, over 17303.00 frames. ], tot_loss[loss=0.4361, ctc_loss=0.3489, cr_loss=0.4365, over 3348458.86 frames. ], batch size: 51, lr: 4.40e-02, grad_scale: 32.0 2024-09-22 12:05:15,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=11.6125 2024-09-22 12:05:24,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11013.333333333334, ans=0.18986666666666668 2024-09-22 12:05:25,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=11013.333333333334, ans=0.025 2024-09-22 12:05:45,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=11.6475 2024-09-22 12:05:58,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=11106.666666666666, ans=0.5112666666666668 2024-09-22 12:06:22,204 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:06:25,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11153.333333333334, ans=0.18846666666666667 2024-09-22 12:06:26,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=11153.333333333334, ans=0.5096333333333334 2024-09-22 12:06:30,102 INFO [train.py:1198] (3/4) Epoch 1, batch 2400, loss[loss=0.418, ctc_loss=0.3338, cr_loss=0.4208, over 17092.00 frames. ], tot_loss[loss=0.4302, ctc_loss=0.3434, cr_loss=0.4342, over 3357690.16 frames. ], batch size: 40, lr: 4.39e-02, grad_scale: 32.0 2024-09-22 12:06:37,259 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.394e+02 1.777e+02 2.025e+02 2.571e+02 5.493e+02, threshold=4.051e+02, percent-clipped=2.0 2024-09-22 12:06:48,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=11246.666666666666, ans=0.125 2024-09-22 12:06:50,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=11246.666666666666, ans=0.125 2024-09-22 12:06:50,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=11246.666666666666, ans=0.07 2024-09-22 12:07:57,473 INFO [train.py:1198] (3/4) Epoch 1, batch 2450, loss[loss=0.4858, ctc_loss=0.3905, cr_loss=0.4765, over 15061.00 frames. ], tot_loss[loss=0.4291, ctc_loss=0.3421, cr_loss=0.4348, over 3364046.52 frames. ], batch size: 89, lr: 4.39e-02, grad_scale: 64.0 2024-09-22 12:08:15,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=11480.0, ans=0.0 2024-09-22 12:08:49,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=11573.333333333334, ans=0.2 2024-09-22 12:08:52,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11573.333333333334, ans=0.18426666666666666 2024-09-22 12:09:02,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=11.8575 2024-09-22 12:09:18,267 INFO [train.py:1198] (3/4) Epoch 1, batch 2500, loss[loss=0.436, ctc_loss=0.3429, cr_loss=0.4655, over 17297.00 frames. ], tot_loss[loss=0.4274, ctc_loss=0.3406, cr_loss=0.4344, over 3361392.93 frames. ], batch size: 49, lr: 4.38e-02, grad_scale: 64.0 2024-09-22 12:09:22,940 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.438e+02 2.074e+02 2.928e+02 4.593e+02 9.871e+02, threshold=5.856e+02, percent-clipped=30.0 2024-09-22 12:09:23,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.57 vs. limit=7.916666666666666 2024-09-22 12:09:26,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=11666.666666666666, ans=0.008333333333333333 2024-09-22 12:09:52,240 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=11.91 2024-09-22 12:10:03,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11760.0, ans=0.1824 2024-09-22 12:10:12,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=11806.666666666666, ans=0.125 2024-09-22 12:10:23,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=11853.333333333334, ans=0.017277777777777774 2024-09-22 12:10:40,846 INFO [train.py:1198] (3/4) Epoch 1, batch 2550, loss[loss=0.4155, ctc_loss=0.3289, cr_loss=0.4327, over 17238.00 frames. ], tot_loss[loss=0.4249, ctc_loss=0.3381, cr_loss=0.4342, over 3369268.59 frames. ], batch size: 50, lr: 4.38e-02, grad_scale: 64.0 2024-09-22 12:10:59,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.53 vs. limit=11.98 2024-09-22 12:11:08,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=11946.666666666666, ans=0.025 2024-09-22 12:11:44,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12040.0, ans=0.17959999999999998 2024-09-22 12:11:51,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=12086.666666666666, ans=0.01630555555555556 2024-09-22 12:11:53,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=12086.666666666666, ans=0.17913333333333334 2024-09-22 12:11:59,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=12086.666666666666, ans=0.4769666666666667 2024-09-22 12:12:04,478 INFO [train.py:1198] (3/4) Epoch 1, batch 2600, loss[loss=0.4461, ctc_loss=0.3599, cr_loss=0.4306, over 17017.00 frames. ], tot_loss[loss=0.4244, ctc_loss=0.3373, cr_loss=0.4355, over 3366469.48 frames. ], batch size: 56, lr: 4.37e-02, grad_scale: 64.0 2024-09-22 12:12:09,353 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.462e+02 1.984e+02 2.668e+02 3.339e+02 5.918e+02, threshold=5.335e+02, percent-clipped=1.0 2024-09-22 12:12:14,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=12133.333333333334, ans=0.04949747468305833 2024-09-22 12:12:34,857 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:13:04,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=12273.333333333334, ans=0.025 2024-09-22 12:13:12,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=12320.0, ans=0.125 2024-09-22 12:13:22,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=12.120000000000001 2024-09-22 12:13:29,614 INFO [train.py:1198] (3/4) Epoch 1, batch 2650, loss[loss=0.3764, ctc_loss=0.2911, cr_loss=0.4266, over 17094.00 frames. ], tot_loss[loss=0.4209, ctc_loss=0.334, cr_loss=0.4346, over 3372379.86 frames. ], batch size: 40, lr: 4.37e-02, grad_scale: 64.0 2024-09-22 12:13:31,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=12366.666666666666, ans=0.125 2024-09-22 12:13:57,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=12413.333333333334, ans=0.125 2024-09-22 12:14:02,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=12460.0, ans=0.0 2024-09-22 12:14:14,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=12460.0, ans=0.08474500000000001 2024-09-22 12:14:15,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=12460.0, ans=0.04949747468305833 2024-09-22 12:14:18,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=12506.666666666666, ans=0.125 2024-09-22 12:14:52,684 INFO [train.py:1198] (3/4) Epoch 1, batch 2700, loss[loss=0.4304, ctc_loss=0.3399, cr_loss=0.4526, over 17107.00 frames. ], tot_loss[loss=0.4191, ctc_loss=0.3322, cr_loss=0.4344, over 3371931.66 frames. ], batch size: 49, lr: 4.36e-02, grad_scale: 64.0 2024-09-22 12:14:57,514 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.460e+02 1.915e+02 2.536e+02 3.410e+02 5.700e+02, threshold=5.072e+02, percent-clipped=2.0 2024-09-22 12:14:57,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=12600.0, ans=0.459 2024-09-22 12:15:13,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=12646.666666666666, ans=0.125 2024-09-22 12:15:39,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=12740.0, ans=0.4541 2024-09-22 12:15:40,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=12740.0, ans=0.0081 2024-09-22 12:16:07,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=12786.666666666666, ans=0.125 2024-09-22 12:16:12,418 INFO [train.py:1198] (3/4) Epoch 1, batch 2750, loss[loss=0.3328, ctc_loss=0.2579, cr_loss=0.3743, over 17124.00 frames. ], tot_loss[loss=0.4184, ctc_loss=0.3313, cr_loss=0.4351, over 3373348.50 frames. ], batch size: 40, lr: 4.36e-02, grad_scale: 64.0 2024-09-22 12:16:22,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=12.3125 2024-09-22 12:16:23,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=12833.333333333334, ans=0.125 2024-09-22 12:17:00,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=12926.666666666666, ans=0.4475666666666667 2024-09-22 12:17:40,890 INFO [train.py:1198] (3/4) Epoch 1, batch 2800, loss[loss=0.4576, ctc_loss=0.3687, cr_loss=0.4445, over 14932.00 frames. ], tot_loss[loss=0.4145, ctc_loss=0.3276, cr_loss=0.4344, over 3376529.81 frames. ], batch size: 89, lr: 4.36e-02, grad_scale: 64.0 2024-09-22 12:17:45,584 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.462e+02 1.867e+02 2.104e+02 2.772e+02 6.258e+02, threshold=4.209e+02, percent-clipped=2.0 2024-09-22 12:19:00,772 INFO [train.py:1198] (3/4) Epoch 1, batch 2850, loss[loss=0.4476, ctc_loss=0.3571, cr_loss=0.4527, over 17325.00 frames. ], tot_loss[loss=0.4134, ctc_loss=0.3264, cr_loss=0.4346, over 3374960.85 frames. ], batch size: 51, lr: 4.35e-02, grad_scale: 32.0 2024-09-22 12:19:11,604 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.01 vs. limit=6.66 2024-09-22 12:19:29,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=13346.666666666666, ans=0.125 2024-09-22 12:19:31,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=13346.666666666666, ans=0.011055555555555562 2024-09-22 12:19:34,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=13393.333333333334, ans=0.125 2024-09-22 12:19:34,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.02 vs. limit=11.696666666666667 2024-09-22 12:20:24,551 INFO [train.py:1198] (3/4) Epoch 1, batch 2900, loss[loss=0.4189, ctc_loss=0.3267, cr_loss=0.4611, over 17137.00 frames. ], tot_loss[loss=0.411, ctc_loss=0.3245, cr_loss=0.4327, over 3367437.92 frames. ], batch size: 48, lr: 4.35e-02, grad_scale: 32.0 2024-09-22 12:20:31,046 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.490e+02 1.863e+02 2.342e+02 3.249e+02 5.939e+02, threshold=4.685e+02, percent-clipped=7.0 2024-09-22 12:20:34,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=13533.333333333334, ans=0.007927536231884058 2024-09-22 12:20:55,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=13626.666666666666, ans=0.125 2024-09-22 12:20:58,183 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.98 vs. limit=9.450666666666667 2024-09-22 12:21:05,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=13626.666666666666, ans=0.009888888888888892 2024-09-22 12:21:13,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=13673.333333333334, ans=0.125 2024-09-22 12:21:18,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=13673.333333333334, ans=0.125 2024-09-22 12:21:37,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=13720.0, ans=0.007886956521739132 2024-09-22 12:21:40,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=9.488 2024-09-22 12:21:47,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=12.6625 2024-09-22 12:21:48,334 INFO [train.py:1198] (3/4) Epoch 1, batch 2950, loss[loss=0.5118, ctc_loss=0.4154, cr_loss=0.4822, over 12241.00 frames. ], tot_loss[loss=0.4103, ctc_loss=0.3238, cr_loss=0.4327, over 3354076.85 frames. ], batch size: 123, lr: 4.34e-02, grad_scale: 32.0 2024-09-22 12:21:48,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13766.666666666666, ans=0.16233333333333333 2024-09-22 12:22:12,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.40 vs. limit=12.68 2024-09-22 12:22:32,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.55 vs. limit=11.93 2024-09-22 12:22:45,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=13906.666666666666, ans=0.008722222222222228 2024-09-22 12:22:54,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=13906.666666666666, ans=0.125 2024-09-22 12:23:01,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=13953.333333333334, ans=0.125 2024-09-22 12:23:12,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=14000.0, ans=0.09899494936611666 2024-09-22 12:23:13,842 INFO [train.py:1198] (3/4) Epoch 1, batch 3000, loss[loss=0.3947, ctc_loss=0.3058, cr_loss=0.4446, over 17091.00 frames. ], tot_loss[loss=0.407, ctc_loss=0.3207, cr_loss=0.4318, over 3360397.76 frames. ], batch size: 49, lr: 4.34e-02, grad_scale: 32.0 2024-09-22 12:23:13,842 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 12:23:29,184 INFO [train.py:1230] (3/4) Epoch 1, validation: loss=0.1235, ctc_loss=0.1235, cr_loss=7.044e-15, over 944034.00 frames. 2024-09-22 12:23:29,184 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-22 12:23:35,639 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.419e+02 1.886e+02 2.276e+02 2.833e+02 5.148e+02, threshold=4.553e+02, percent-clipped=2.0 2024-09-22 12:24:07,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=12.785 2024-09-22 12:24:13,617 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:24:34,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.46 vs. limit=8.546666666666667 2024-09-22 12:24:39,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=14186.666666666666, ans=0.4034666666666667 2024-09-22 12:24:42,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=14186.666666666666, ans=0.007785507246376812 2024-09-22 12:24:47,937 INFO [train.py:1198] (3/4) Epoch 1, batch 3050, loss[loss=0.4012, ctc_loss=0.3152, cr_loss=0.4298, over 17036.00 frames. ], tot_loss[loss=0.4047, ctc_loss=0.3185, cr_loss=0.4313, over 3364803.21 frames. ], batch size: 53, lr: 4.33e-02, grad_scale: 32.0 2024-09-22 12:24:54,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=14233.333333333334, ans=0.4018333333333333 2024-09-22 12:24:56,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=14233.333333333334, ans=0.125 2024-09-22 12:25:18,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=14326.666666666666, ans=0.00697222222222222 2024-09-22 12:26:09,730 INFO [train.py:1198] (3/4) Epoch 1, batch 3100, loss[loss=0.4492, ctc_loss=0.351, cr_loss=0.491, over 17023.00 frames. ], tot_loss[loss=0.4026, ctc_loss=0.3165, cr_loss=0.4304, over 3373911.19 frames. ], batch size: 52, lr: 4.33e-02, grad_scale: 32.0 2024-09-22 12:26:15,853 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.374e+02 1.870e+02 2.371e+02 3.057e+02 5.717e+02, threshold=4.743e+02, percent-clipped=5.0 2024-09-22 12:27:02,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=14606.666666666666, ans=0.007694202898550725 2024-09-22 12:27:21,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=14653.333333333334, ans=0.125 2024-09-22 12:27:22,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=14653.333333333334, ans=0.05 2024-09-22 12:27:22,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=14653.333333333334, ans=0.38713333333333333 2024-09-22 12:27:28,659 INFO [train.py:1198] (3/4) Epoch 1, batch 3150, loss[loss=0.4154, ctc_loss=0.3309, cr_loss=0.4223, over 17313.00 frames. ], tot_loss[loss=0.4008, ctc_loss=0.3148, cr_loss=0.43, over 3372955.12 frames. ], batch size: 51, lr: 4.32e-02, grad_scale: 32.0 2024-09-22 12:27:28,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=14700.0, ans=0.04949747468305833 2024-09-22 12:27:30,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=14700.0, ans=0.125 2024-09-22 12:27:49,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=14746.666666666666, ans=0.125 2024-09-22 12:27:51,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=13.030000000000001 2024-09-22 12:28:33,319 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=13.0825 2024-09-22 12:28:47,528 INFO [train.py:1198] (3/4) Epoch 1, batch 3200, loss[loss=0.4537, ctc_loss=0.3638, cr_loss=0.4495, over 16106.00 frames. ], tot_loss[loss=0.4002, ctc_loss=0.3142, cr_loss=0.43, over 3367304.31 frames. ], batch size: 74, lr: 4.32e-02, grad_scale: 32.0 2024-09-22 12:28:53,530 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.398e+02 1.832e+02 2.280e+02 2.861e+02 6.877e+02, threshold=4.560e+02, percent-clipped=3.0 2024-09-22 12:29:08,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=14980.0, ans=0.1502 2024-09-22 12:29:16,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=14980.0, ans=0.125 2024-09-22 12:29:55,419 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:30:06,105 INFO [train.py:1198] (3/4) Epoch 1, batch 3250, loss[loss=0.3799, ctc_loss=0.2917, cr_loss=0.4407, over 17040.00 frames. ], tot_loss[loss=0.3997, ctc_loss=0.3136, cr_loss=0.4303, over 3355529.47 frames. ], batch size: 56, lr: 4.31e-02, grad_scale: 32.0 2024-09-22 12:30:06,893 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.04 vs. limit=12.583333333333332 2024-09-22 12:30:08,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=13.1875 2024-09-22 12:30:33,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=15213.333333333334, ans=0.07 2024-09-22 12:30:49,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=15260.0, ans=0.3659 2024-09-22 12:30:53,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=15306.666666666666, ans=0.125 2024-09-22 12:30:59,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=15306.666666666666, ans=0.125 2024-09-22 12:31:12,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=15353.333333333334, ans=0.125 2024-09-22 12:31:18,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=15353.333333333334, ans=13.2575 2024-09-22 12:31:19,583 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=13.2575 2024-09-22 12:31:24,457 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.10 vs. limit=19.015 2024-09-22 12:31:26,919 INFO [train.py:1198] (3/4) Epoch 1, batch 3300, loss[loss=0.5057, ctc_loss=0.411, cr_loss=0.4733, over 11498.00 frames. ], tot_loss[loss=0.3975, ctc_loss=0.3118, cr_loss=0.4285, over 3347975.14 frames. ], batch size: 123, lr: 4.31e-02, grad_scale: 32.0 2024-09-22 12:31:30,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=15400.0, ans=0.0025000000000000022 2024-09-22 12:31:32,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=15400.0, ans=0.361 2024-09-22 12:31:33,371 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.515e+02 1.835e+02 2.401e+02 3.313e+02 5.174e+02, threshold=4.802e+02, percent-clipped=5.0 2024-09-22 12:31:42,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=15400.0, ans=0.125 2024-09-22 12:31:48,671 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:32:01,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=15493.333333333334, ans=0.125 2024-09-22 12:32:27,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=15540.0, ans=0.007491304347826087 2024-09-22 12:32:34,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=15586.666666666666, ans=0.125 2024-09-22 12:32:50,478 INFO [train.py:1198] (3/4) Epoch 1, batch 3350, loss[loss=0.3913, ctc_loss=0.3092, cr_loss=0.4108, over 17201.00 frames. ], tot_loss[loss=0.3973, ctc_loss=0.3116, cr_loss=0.4286, over 3340791.83 frames. ], batch size: 47, lr: 4.30e-02, grad_scale: 32.0 2024-09-22 12:32:50,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=15633.333333333334, ans=0.007471014492753623 2024-09-22 12:33:19,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=15680.0, ans=0.35120000000000007 2024-09-22 12:33:22,739 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=13.3975 2024-09-22 12:33:34,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=15726.666666666666, ans=0.0 2024-09-22 12:33:34,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=15726.666666666666, ans=0.0 2024-09-22 12:33:40,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=15773.333333333334, ans=0.14226666666666665 2024-09-22 12:34:04,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=15820.0, ans=0.0007500000000000007 2024-09-22 12:34:05,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=15820.0, ans=0.125 2024-09-22 12:34:09,474 INFO [train.py:1198] (3/4) Epoch 1, batch 3400, loss[loss=0.4307, ctc_loss=0.3413, cr_loss=0.4469, over 16592.00 frames. ], tot_loss[loss=0.3939, ctc_loss=0.3084, cr_loss=0.4274, over 3346470.19 frames. ], batch size: 66, lr: 4.29e-02, grad_scale: 32.0 2024-09-22 12:34:15,799 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.414e+02 1.769e+02 2.096e+02 2.628e+02 4.837e+02, threshold=4.193e+02, percent-clipped=1.0 2024-09-22 12:34:16,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=15866.666666666666, ans=0.438 2024-09-22 12:34:22,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=15866.666666666666, ans=0.0005555555555555522 2024-09-22 12:34:41,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=15960.0, ans=0.125 2024-09-22 12:34:53,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=15960.0, ans=0.00016666666666666913 2024-09-22 12:35:12,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=16053.333333333334, ans=0.3381333333333333 2024-09-22 12:35:28,156 INFO [train.py:1198] (3/4) Epoch 1, batch 3450, loss[loss=0.3338, ctc_loss=0.2563, cr_loss=0.3875, over 17281.00 frames. ], tot_loss[loss=0.3926, ctc_loss=0.307, cr_loss=0.4276, over 3355829.62 frames. ], batch size: 42, lr: 4.29e-02, grad_scale: 32.0 2024-09-22 12:35:48,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=16146.666666666666, ans=0.08853333333333332 2024-09-22 12:36:12,140 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=13.5725 2024-09-22 12:36:12,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16193.333333333334, ans=0.13806666666666667 2024-09-22 12:36:12,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=16193.333333333334, ans=0.025 2024-09-22 12:36:16,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.28 vs. limit=13.12 2024-09-22 12:36:44,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=16286.666666666666, ans=0.125 2024-09-22 12:36:47,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=16333.333333333334, ans=0.125 2024-09-22 12:36:48,697 INFO [train.py:1198] (3/4) Epoch 1, batch 3500, loss[loss=0.373, ctc_loss=0.2829, cr_loss=0.4503, over 17228.00 frames. ], tot_loss[loss=0.3916, ctc_loss=0.3062, cr_loss=0.4273, over 3341194.48 frames. ], batch size: 50, lr: 4.28e-02, grad_scale: 32.0 2024-09-22 12:36:52,027 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:36:54,800 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.402e+02 1.781e+02 2.318e+02 3.100e+02 5.527e+02, threshold=4.636e+02, percent-clipped=9.0 2024-09-22 12:37:34,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=13.677499999999998 2024-09-22 12:38:06,624 INFO [train.py:1198] (3/4) Epoch 1, batch 3550, loss[loss=0.3511, ctc_loss=0.2679, cr_loss=0.4157, over 17163.00 frames. ], tot_loss[loss=0.3914, ctc_loss=0.3059, cr_loss=0.4279, over 3344056.99 frames. ], batch size: 45, lr: 4.28e-02, grad_scale: 32.0 2024-09-22 12:38:06,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=16566.666666666668, ans=0.125 2024-09-22 12:38:26,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=16613.333333333332, ans=0.025 2024-09-22 12:38:39,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=16660.0, ans=0.0 2024-09-22 12:38:41,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=16660.0, ans=0.31690000000000007 2024-09-22 12:38:44,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=16660.0, ans=0.125 2024-09-22 12:38:52,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=16706.666666666668, ans=0.125 2024-09-22 12:39:03,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.63 vs. limit=13.765 2024-09-22 12:39:06,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16706.666666666668, ans=0.13293333333333332 2024-09-22 12:39:10,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=16753.333333333332, ans=0.0072275362318840585 2024-09-22 12:39:24,819 INFO [train.py:1198] (3/4) Epoch 1, batch 3600, loss[loss=0.4531, ctc_loss=0.3588, cr_loss=0.4719, over 15934.00 frames. ], tot_loss[loss=0.3908, ctc_loss=0.3052, cr_loss=0.4279, over 3336267.11 frames. ], batch size: 74, lr: 4.27e-02, grad_scale: 32.0 2024-09-22 12:39:25,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=16800.0, ans=0.0072173913043478265 2024-09-22 12:39:28,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.17 vs. limit=9.2 2024-09-22 12:39:30,801 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.429e+02 1.842e+02 2.057e+02 2.677e+02 5.057e+02, threshold=4.115e+02, percent-clipped=2.0 2024-09-22 12:39:41,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=16846.666666666668, ans=0.125 2024-09-22 12:39:43,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=16846.666666666668, ans=0.07 2024-09-22 12:39:44,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=16846.666666666668, ans=0.125 2024-09-22 12:39:55,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.83 vs. limit=5.534 2024-09-22 12:39:57,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=16893.333333333332, ans=0.125 2024-09-22 12:40:05,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=16893.333333333332, ans=0.007197101449275363 2024-09-22 12:40:23,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=16940.0, ans=0.125 2024-09-22 12:40:27,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.74 vs. limit=13.870000000000001 2024-09-22 12:40:30,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=16986.666666666668, ans=0.125 2024-09-22 12:40:33,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=16986.666666666668, ans=0.13013333333333332 2024-09-22 12:40:35,838 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.09 vs. limit=9.246666666666666 2024-09-22 12:40:43,995 INFO [train.py:1198] (3/4) Epoch 1, batch 3650, loss[loss=0.3931, ctc_loss=0.2999, cr_loss=0.4661, over 17303.00 frames. ], tot_loss[loss=0.3896, ctc_loss=0.304, cr_loss=0.428, over 3336413.41 frames. ], batch size: 49, lr: 4.27e-02, grad_scale: 32.0 2024-09-22 12:40:45,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=17033.333333333332, ans=0.125 2024-09-22 12:40:52,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=17033.333333333332, ans=0.05 2024-09-22 12:41:39,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=13.94 2024-09-22 12:42:05,822 INFO [train.py:1198] (3/4) Epoch 1, batch 3700, loss[loss=0.3576, ctc_loss=0.2752, cr_loss=0.412, over 17239.00 frames. ], tot_loss[loss=0.3893, ctc_loss=0.3037, cr_loss=0.428, over 3335120.30 frames. ], batch size: 44, lr: 4.26e-02, grad_scale: 32.0 2024-09-22 12:42:12,057 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.410e+02 1.892e+02 2.660e+02 3.633e+02 5.715e+02, threshold=5.320e+02, percent-clipped=15.0 2024-09-22 12:42:13,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=17266.666666666668, ans=0.12733333333333333 2024-09-22 12:42:28,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=20.485 2024-09-22 12:42:40,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=17360.0, ans=0.1264 2024-09-22 12:43:10,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=17453.333333333332, ans=0.00707536231884058 2024-09-22 12:43:22,579 INFO [train.py:1198] (3/4) Epoch 1, batch 3750, loss[loss=0.3901, ctc_loss=0.3032, cr_loss=0.4341, over 17004.00 frames. ], tot_loss[loss=0.3897, ctc_loss=0.3043, cr_loss=0.4269, over 3320040.54 frames. ], batch size: 53, lr: 4.26e-02, grad_scale: 32.0 2024-09-22 12:43:50,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=17546.666666666668, ans=0.125 2024-09-22 12:43:57,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=17593.333333333332, ans=0.125 2024-09-22 12:44:40,336 INFO [train.py:1198] (3/4) Epoch 1, batch 3800, loss[loss=0.3444, ctc_loss=0.2645, cr_loss=0.3997, over 16936.00 frames. ], tot_loss[loss=0.3896, ctc_loss=0.3042, cr_loss=0.4273, over 3313461.97 frames. ], batch size: 42, lr: 4.25e-02, grad_scale: 32.0 2024-09-22 12:44:45,903 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=11.093333333333334 2024-09-22 12:44:46,420 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.363e+02 1.780e+02 2.356e+02 3.200e+02 5.376e+02, threshold=4.713e+02, percent-clipped=1.0 2024-09-22 12:44:46,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=17733.333333333332, ans=0.007014492753623189 2024-09-22 12:44:48,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=17733.333333333332, ans=0.125 2024-09-22 12:45:00,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=17780.0, ans=0.0 2024-09-22 12:45:12,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.68 vs. limit=14.184999999999999 2024-09-22 12:45:18,815 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=14.184999999999999 2024-09-22 12:45:31,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=17873.333333333332, ans=0.46809999999999996 2024-09-22 12:45:48,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=17920.0, ans=0.006973913043478261 2024-09-22 12:45:54,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=17920.0, ans=0.125 2024-09-22 12:45:59,546 INFO [train.py:1198] (3/4) Epoch 1, batch 3850, loss[loss=0.4881, ctc_loss=0.3968, cr_loss=0.4565, over 11993.00 frames. ], tot_loss[loss=0.3934, ctc_loss=0.3077, cr_loss=0.4281, over 3265026.52 frames. ], batch size: 123, lr: 4.24e-02, grad_scale: 32.0 2024-09-22 12:46:01,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=14.2375 2024-09-22 12:46:02,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=17966.666666666668, ans=0.125 2024-09-22 12:46:04,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=17966.666666666668, ans=0.006963768115942029 2024-09-22 12:46:06,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=17966.666666666668, ans=0.125 2024-09-22 12:46:06,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=17966.666666666668, ans=0.0 2024-09-22 12:46:06,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=14.2375 2024-09-22 12:46:45,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=18106.666666666668, ans=0.125 2024-09-22 12:46:51,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=18106.666666666668, ans=10.0 2024-09-22 12:48:02,938 INFO [train.py:1198] (3/4) Epoch 2, batch 0, loss[loss=0.3253, ctc_loss=0.2498, cr_loss=0.3775, over 17275.00 frames. ], tot_loss[loss=0.3253, ctc_loss=0.2498, cr_loss=0.3775, over 17275.00 frames. ], batch size: 42, lr: 4.16e-02, grad_scale: 32.0 2024-09-22 12:48:02,938 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 12:48:18,075 INFO [train.py:1230] (3/4) Epoch 2, validation: loss=0.1169, ctc_loss=0.1169, cr_loss=1.034e-14, over 944034.00 frames. 2024-09-22 12:48:18,076 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-22 12:48:30,953 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.378e+02 1.944e+02 2.365e+02 3.007e+02 5.794e+02, threshold=4.731e+02, percent-clipped=1.0 2024-09-22 12:48:40,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=18228.0, ans=11.2912 2024-09-22 12:48:41,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=14.3355 2024-09-22 12:49:22,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.89 vs. limit=5.7552 2024-09-22 12:49:24,864 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=14.388 2024-09-22 12:49:33,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=18368.0, ans=0.125 2024-09-22 12:49:38,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=18414.666666666668, ans=0.006866376811594203 2024-09-22 12:49:39,681 INFO [train.py:1198] (3/4) Epoch 2, batch 50, loss[loss=0.3637, ctc_loss=0.2827, cr_loss=0.4049, over 17026.00 frames. ], tot_loss[loss=0.3832, ctc_loss=0.2983, cr_loss=0.4247, over 756803.59 frames. ], batch size: 51, lr: 4.15e-02, grad_scale: 32.0 2024-09-22 12:49:51,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=18414.666666666668, ans=0.125 2024-09-22 12:50:04,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=14.423 2024-09-22 12:50:36,433 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=11.421866666666666 2024-09-22 12:50:38,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=18554.666666666668, ans=0.11445333333333332 2024-09-22 12:50:43,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=18601.333333333332, ans=0.0 2024-09-22 12:50:46,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=18601.333333333332, ans=0.125 2024-09-22 12:50:48,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=18601.333333333332, ans=0.025 2024-09-22 12:50:54,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=18601.333333333332, ans=0.125 2024-09-22 12:50:56,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=18601.333333333332, ans=0.0 2024-09-22 12:50:59,413 INFO [train.py:1198] (3/4) Epoch 2, batch 100, loss[loss=0.4021, ctc_loss=0.3108, cr_loss=0.4565, over 17209.00 frames. ], tot_loss[loss=0.3825, ctc_loss=0.2976, cr_loss=0.4249, over 1328338.78 frames. ], batch size: 55, lr: 4.15e-02, grad_scale: 32.0 2024-09-22 12:51:19,088 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.460e+02 1.796e+02 2.128e+02 2.839e+02 5.119e+02, threshold=4.256e+02, percent-clipped=1.0 2024-09-22 12:51:49,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=21.555999999999997 2024-09-22 12:51:53,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=18788.0, ans=0.006785217391304348 2024-09-22 12:51:57,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=18788.0, ans=0.125 2024-09-22 12:52:26,282 INFO [train.py:1198] (3/4) Epoch 2, batch 150, loss[loss=0.4085, ctc_loss=0.3223, cr_loss=0.4308, over 15892.00 frames. ], tot_loss[loss=0.3804, ctc_loss=0.2956, cr_loss=0.4243, over 1776416.28 frames. ], batch size: 74, lr: 4.14e-02, grad_scale: 32.0 2024-09-22 12:52:38,732 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:52:40,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=18881.333333333332, ans=0.23915333333333344 2024-09-22 12:53:00,014 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.88 vs. limit=9.743666666666666 2024-09-22 12:53:33,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=14.633 2024-09-22 12:53:36,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=19068.0, ans=0.006724347826086956 2024-09-22 12:53:51,162 INFO [train.py:1198] (3/4) Epoch 2, batch 200, loss[loss=0.3747, ctc_loss=0.2909, cr_loss=0.4188, over 17235.00 frames. ], tot_loss[loss=0.3802, ctc_loss=0.295, cr_loss=0.4257, over 2132178.28 frames. ], batch size: 50, lr: 4.13e-02, grad_scale: 32.0 2024-09-22 12:54:03,746 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.386e+02 1.786e+02 2.214e+02 2.969e+02 6.338e+02, threshold=4.427e+02, percent-clipped=7.0 2024-09-22 12:54:35,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=19208.0, ans=0.22772000000000014 2024-09-22 12:54:42,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=19254.666666666668, ans=0.125 2024-09-22 12:55:10,172 INFO [train.py:1198] (3/4) Epoch 2, batch 250, loss[loss=0.4066, ctc_loss=0.3221, cr_loss=0.4223, over 17215.00 frames. ], tot_loss[loss=0.3782, ctc_loss=0.293, cr_loss=0.4256, over 2409016.28 frames. ], batch size: 55, lr: 4.13e-02, grad_scale: 32.0 2024-09-22 12:55:23,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=19348.0, ans=0.025 2024-09-22 12:55:26,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=19394.666666666668, ans=0.025 2024-09-22 12:55:31,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=19394.666666666668, ans=0.006653333333333333 2024-09-22 12:55:39,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.06 vs. limit=22.046 2024-09-22 12:56:35,318 INFO [train.py:1198] (3/4) Epoch 2, batch 300, loss[loss=0.4088, ctc_loss=0.3151, cr_loss=0.4687, over 16925.00 frames. ], tot_loss[loss=0.3744, ctc_loss=0.2898, cr_loss=0.4232, over 2624977.63 frames. ], batch size: 58, lr: 4.12e-02, grad_scale: 32.0 2024-09-22 12:56:45,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=19581.333333333332, ans=0.125 2024-09-22 12:56:48,374 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.373e+02 1.759e+02 2.124e+02 2.853e+02 4.892e+02, threshold=4.248e+02, percent-clipped=3.0 2024-09-22 12:56:54,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=14.8605 2024-09-22 12:57:22,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=19721.333333333332, ans=0.125 2024-09-22 12:57:42,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19768.0, ans=0.10232000000000002 2024-09-22 12:57:57,759 INFO [train.py:1198] (3/4) Epoch 2, batch 350, loss[loss=0.3191, ctc_loss=0.2453, cr_loss=0.3691, over 17113.00 frames. ], tot_loss[loss=0.3753, ctc_loss=0.2904, cr_loss=0.4243, over 2779876.57 frames. ], batch size: 40, lr: 4.12e-02, grad_scale: 32.0 2024-09-22 12:58:07,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=19814.666666666668, ans=0.1 2024-09-22 12:58:10,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=19814.666666666668, ans=0.125 2024-09-22 12:58:22,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=11.944533333333332 2024-09-22 12:58:53,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.53 vs. limit=9.988666666666667 2024-09-22 12:58:59,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=19954.666666666668, ans=0.125 2024-09-22 12:58:59,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=19954.666666666668, ans=0.125 2024-09-22 12:59:10,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=20001.333333333332, ans=0.035 2024-09-22 12:59:10,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=20001.333333333332, ans=0.125 2024-09-22 12:59:14,048 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:59:20,248 INFO [train.py:1198] (3/4) Epoch 2, batch 400, loss[loss=0.3948, ctc_loss=0.3037, cr_loss=0.4557, over 17222.00 frames. ], tot_loss[loss=0.3742, ctc_loss=0.2894, cr_loss=0.4239, over 2913936.63 frames. ], batch size: 50, lr: 4.11e-02, grad_scale: 32.0 2024-09-22 12:59:31,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=20048.0, ans=0.025 2024-09-22 12:59:32,897 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.308e+02 1.825e+02 2.194e+02 2.995e+02 5.365e+02, threshold=4.388e+02, percent-clipped=4.0 2024-09-22 12:59:36,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=20094.666666666668, ans=0.125 2024-09-22 12:59:40,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=20094.666666666668, ans=0.125 2024-09-22 12:59:45,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=20094.666666666668, ans=0.125 2024-09-22 12:59:50,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=20141.333333333332, ans=0.125 2024-09-22 12:59:54,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2024-09-22 12:59:56,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=20141.333333333332, ans=0.0 2024-09-22 13:00:09,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20188.0, ans=0.1 2024-09-22 13:00:39,702 INFO [train.py:1198] (3/4) Epoch 2, batch 450, loss[loss=0.3051, ctc_loss=0.2326, cr_loss=0.3625, over 17014.00 frames. ], tot_loss[loss=0.3733, ctc_loss=0.2884, cr_loss=0.4246, over 3016513.90 frames. ], batch size: 39, lr: 4.10e-02, grad_scale: 32.0 2024-09-22 13:00:41,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20281.333333333332, ans=0.1 2024-09-22 13:00:49,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=20281.333333333332, ans=0.2 2024-09-22 13:01:01,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20328.0, ans=0.1 2024-09-22 13:01:12,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=20328.0, ans=0.125 2024-09-22 13:01:27,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=20374.666666666668, ans=0.025 2024-09-22 13:01:46,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.60 vs. limit=10.0 2024-09-22 13:01:57,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=20468.0, ans=0.5 2024-09-22 13:02:01,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=20468.0, ans=0.125 2024-09-22 13:02:04,848 INFO [train.py:1198] (3/4) Epoch 2, batch 500, loss[loss=0.3636, ctc_loss=0.2772, cr_loss=0.4321, over 17181.00 frames. ], tot_loss[loss=0.3738, ctc_loss=0.2888, cr_loss=0.4249, over 3094278.85 frames. ], batch size: 45, lr: 4.10e-02, grad_scale: 32.0 2024-09-22 13:02:11,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=20514.666666666668, ans=0.04949747468305833 2024-09-22 13:02:17,796 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.414e+02 1.804e+02 2.205e+02 2.881e+02 5.655e+02, threshold=4.410e+02, percent-clipped=3.0 2024-09-22 13:02:42,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=20608.0, ans=0.0 2024-09-22 13:02:51,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2024-09-22 13:02:58,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=20654.666666666668, ans=0.025 2024-09-22 13:03:13,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=20701.333333333332, ans=0.1 2024-09-22 13:03:29,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2024-09-22 13:03:29,950 INFO [train.py:1198] (3/4) Epoch 2, batch 550, loss[loss=0.3461, ctc_loss=0.266, cr_loss=0.4009, over 17146.00 frames. ], tot_loss[loss=0.3738, ctc_loss=0.2887, cr_loss=0.425, over 3143968.70 frames. ], batch size: 45, lr: 4.09e-02, grad_scale: 32.0 2024-09-22 13:03:30,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=20748.0, ans=0.125 2024-09-22 13:03:36,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=20748.0, ans=0.125 2024-09-22 13:03:53,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=20794.666666666668, ans=0.0 2024-09-22 13:04:20,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=12.0 2024-09-22 13:04:26,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=20888.0, ans=0.125 2024-09-22 13:04:37,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=20934.666666666668, ans=0.125 2024-09-22 13:04:45,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=15.0 2024-09-22 13:04:49,742 INFO [train.py:1198] (3/4) Epoch 2, batch 600, loss[loss=0.361, ctc_loss=0.2797, cr_loss=0.4064, over 17240.00 frames. ], tot_loss[loss=0.373, ctc_loss=0.2882, cr_loss=0.4242, over 3192926.82 frames. ], batch size: 44, lr: 4.09e-02, grad_scale: 32.0 2024-09-22 13:04:59,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=20981.333333333332, ans=0.125 2024-09-22 13:05:02,738 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.397e+02 1.714e+02 2.171e+02 2.480e+02 4.403e+02, threshold=4.342e+02, percent-clipped=0.0 2024-09-22 13:05:17,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21028.0, ans=0.1 2024-09-22 13:05:25,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=21074.666666666668, ans=0.125 2024-09-22 13:05:41,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=21121.333333333332, ans=0.02 2024-09-22 13:06:15,353 INFO [train.py:1198] (3/4) Epoch 2, batch 650, loss[loss=0.458, ctc_loss=0.3628, cr_loss=0.4761, over 11622.00 frames. ], tot_loss[loss=0.3688, ctc_loss=0.2845, cr_loss=0.4219, over 3234889.25 frames. ], batch size: 123, lr: 4.08e-02, grad_scale: 32.0 2024-09-22 13:06:28,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=21214.666666666668, ans=0.1 2024-09-22 13:06:34,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=21261.333333333332, ans=0.0 2024-09-22 13:06:48,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.14 vs. limit=22.5 2024-09-22 13:06:58,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=21308.0, ans=0.125 2024-09-22 13:07:02,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=21354.666666666668, ans=0.025 2024-09-22 13:07:37,763 INFO [train.py:1198] (3/4) Epoch 2, batch 700, loss[loss=0.4328, ctc_loss=0.3446, cr_loss=0.441, over 16999.00 frames. ], tot_loss[loss=0.3696, ctc_loss=0.2851, cr_loss=0.4228, over 3256802.77 frames. ], batch size: 53, lr: 4.07e-02, grad_scale: 32.0 2024-09-22 13:07:50,782 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.427e+02 1.762e+02 2.307e+02 2.806e+02 6.388e+02, threshold=4.614e+02, percent-clipped=12.0 2024-09-22 13:08:00,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=21494.666666666668, ans=0.0 2024-09-22 13:08:03,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=21494.666666666668, ans=0.006196811594202899 2024-09-22 13:08:29,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21588.0, ans=0.1 2024-09-22 13:08:59,999 INFO [train.py:1198] (3/4) Epoch 2, batch 750, loss[loss=0.3531, ctc_loss=0.271, cr_loss=0.4104, over 17172.00 frames. ], tot_loss[loss=0.3707, ctc_loss=0.2859, cr_loss=0.4239, over 3270527.30 frames. ], batch size: 45, lr: 4.07e-02, grad_scale: 32.0 2024-09-22 13:09:17,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=21728.0, ans=0.125 2024-09-22 13:09:36,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=21774.666666666668, ans=0.006135942028985507 2024-09-22 13:09:43,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.65 vs. limit=15.0 2024-09-22 13:09:59,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=21821.333333333332, ans=0.0 2024-09-22 13:10:11,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=21868.0, ans=0.2 2024-09-22 13:10:19,247 INFO [train.py:1198] (3/4) Epoch 2, batch 800, loss[loss=0.3585, ctc_loss=0.2747, cr_loss=0.4189, over 17348.00 frames. ], tot_loss[loss=0.3725, ctc_loss=0.2875, cr_loss=0.4251, over 3284229.12 frames. ], batch size: 48, lr: 4.06e-02, grad_scale: 32.0 2024-09-22 13:10:32,122 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.379e+02 1.875e+02 2.284e+02 2.792e+02 4.521e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-22 13:11:29,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.63 vs. limit=15.0 2024-09-22 13:11:31,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=22101.333333333332, ans=0.125 2024-09-22 13:11:38,412 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=12.0 2024-09-22 13:11:44,001 INFO [train.py:1198] (3/4) Epoch 2, batch 850, loss[loss=0.3796, ctc_loss=0.2924, cr_loss=0.4359, over 17026.00 frames. ], tot_loss[loss=0.3732, ctc_loss=0.2881, cr_loss=0.4258, over 3299149.89 frames. ], batch size: 51, lr: 4.06e-02, grad_scale: 32.0 2024-09-22 13:12:04,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=22194.666666666668, ans=10.0 2024-09-22 13:12:12,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=22194.666666666668, ans=0.00604463768115942 2024-09-22 13:12:15,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=22241.333333333332, ans=0.015 2024-09-22 13:12:58,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=22334.666666666668, ans=0.125 2024-09-22 13:13:05,749 INFO [train.py:1198] (3/4) Epoch 2, batch 900, loss[loss=0.3795, ctc_loss=0.2963, cr_loss=0.4162, over 17054.00 frames. ], tot_loss[loss=0.3714, ctc_loss=0.2863, cr_loss=0.4255, over 3319855.73 frames. ], batch size: 46, lr: 4.05e-02, grad_scale: 32.0 2024-09-22 13:13:05,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22381.333333333332, ans=0.1 2024-09-22 13:13:21,506 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.391e+02 1.677e+02 2.003e+02 2.906e+02 6.404e+02, threshold=4.006e+02, percent-clipped=3.0 2024-09-22 13:13:35,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=12.0 2024-09-22 13:13:53,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22474.666666666668, ans=0.1 2024-09-22 13:14:29,003 INFO [train.py:1198] (3/4) Epoch 2, batch 950, loss[loss=0.3342, ctc_loss=0.259, cr_loss=0.3757, over 17122.00 frames. ], tot_loss[loss=0.3714, ctc_loss=0.2863, cr_loss=0.4251, over 3328690.91 frames. ], batch size: 40, lr: 4.04e-02, grad_scale: 64.0 2024-09-22 13:14:46,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=22661.333333333332, ans=0.125 2024-09-22 13:14:50,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-09-22 13:15:01,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22708.0, ans=0.1 2024-09-22 13:15:29,735 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 13:15:45,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=22801.333333333332, ans=0.125 2024-09-22 13:15:48,457 INFO [train.py:1198] (3/4) Epoch 2, batch 1000, loss[loss=0.3757, ctc_loss=0.2859, cr_loss=0.4491, over 17222.00 frames. ], tot_loss[loss=0.3688, ctc_loss=0.2839, cr_loss=0.4244, over 3346361.72 frames. ], batch size: 55, lr: 4.04e-02, grad_scale: 64.0 2024-09-22 13:16:07,762 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.375e+02 1.793e+02 2.170e+02 2.654e+02 4.100e+02, threshold=4.339e+02, percent-clipped=2.0 2024-09-22 13:16:41,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=22988.0, ans=0.0 2024-09-22 13:16:58,375 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=22.5 2024-09-22 13:17:03,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=23034.666666666668, ans=0.125 2024-09-22 13:17:15,716 INFO [train.py:1198] (3/4) Epoch 2, batch 1050, loss[loss=0.3257, ctc_loss=0.2478, cr_loss=0.3892, over 17143.00 frames. ], tot_loss[loss=0.368, ctc_loss=0.2832, cr_loss=0.4237, over 3346725.93 frames. ], batch size: 45, lr: 4.03e-02, grad_scale: 32.0 2024-09-22 13:17:24,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=23081.333333333332, ans=0.0 2024-09-22 13:17:47,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.28 vs. limit=22.5 2024-09-22 13:18:19,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=23221.333333333332, ans=0.2 2024-09-22 13:18:38,530 INFO [train.py:1198] (3/4) Epoch 2, batch 1100, loss[loss=0.3273, ctc_loss=0.2432, cr_loss=0.4206, over 17040.00 frames. ], tot_loss[loss=0.3685, ctc_loss=0.2835, cr_loss=0.4248, over 3344950.95 frames. ], batch size: 39, lr: 4.03e-02, grad_scale: 32.0 2024-09-22 13:18:53,229 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.326e+02 1.707e+02 2.038e+02 2.702e+02 5.370e+02, threshold=4.076e+02, percent-clipped=2.0 2024-09-22 13:18:53,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=23361.333333333332, ans=0.0 2024-09-22 13:18:55,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=22.5 2024-09-22 13:18:58,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=23361.333333333332, ans=0.1 2024-09-22 13:19:13,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2024-09-22 13:19:58,507 INFO [train.py:1198] (3/4) Epoch 2, batch 1150, loss[loss=0.3348, ctc_loss=0.2484, cr_loss=0.4317, over 16966.00 frames. ], tot_loss[loss=0.3685, ctc_loss=0.2835, cr_loss=0.4249, over 3347534.97 frames. ], batch size: 42, lr: 4.02e-02, grad_scale: 32.0 2024-09-22 13:19:58,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=23548.0, ans=0.125 2024-09-22 13:19:59,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.20 vs. limit=12.0 2024-09-22 13:20:06,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=23548.0, ans=0.125 2024-09-22 13:20:10,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.37 vs. limit=15.0 2024-09-22 13:20:30,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=23641.333333333332, ans=0.125 2024-09-22 13:20:59,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=23688.0, ans=0.125 2024-09-22 13:21:04,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=23688.0, ans=0.0 2024-09-22 13:21:05,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=23734.666666666668, ans=0.0 2024-09-22 13:21:17,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=23734.666666666668, ans=0.125 2024-09-22 13:21:23,361 INFO [train.py:1198] (3/4) Epoch 2, batch 1200, loss[loss=0.4317, ctc_loss=0.3461, cr_loss=0.428, over 15240.00 frames. ], tot_loss[loss=0.3699, ctc_loss=0.2847, cr_loss=0.4259, over 3348434.07 frames. ], batch size: 89, lr: 4.01e-02, grad_scale: 32.0 2024-09-22 13:21:37,785 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.332e+02 1.890e+02 2.251e+02 2.751e+02 4.727e+02, threshold=4.502e+02, percent-clipped=4.0 2024-09-22 13:21:43,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=23828.0, ans=0.005689565217391304 2024-09-22 13:21:52,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=23828.0, ans=0.125 2024-09-22 13:21:59,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.00 vs. limit=15.0 2024-09-22 13:22:22,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=23921.333333333332, ans=0.035 2024-09-22 13:22:24,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=23921.333333333332, ans=0.2 2024-09-22 13:22:33,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=23968.0, ans=0.125 2024-09-22 13:22:43,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=23968.0, ans=0.0 2024-09-22 13:22:46,212 INFO [train.py:1198] (3/4) Epoch 2, batch 1250, loss[loss=0.3565, ctc_loss=0.2716, cr_loss=0.4242, over 17042.00 frames. ], tot_loss[loss=0.3675, ctc_loss=0.2828, cr_loss=0.4239, over 3353049.45 frames. ], batch size: 52, lr: 4.01e-02, grad_scale: 32.0 2024-09-22 13:22:57,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=24014.666666666668, ans=0.05 2024-09-22 13:23:31,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=24108.0, ans=0.0 2024-09-22 13:23:34,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=24108.0, ans=0.1 2024-09-22 13:23:35,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=24154.666666666668, ans=0.125 2024-09-22 13:23:42,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=24154.666666666668, ans=0.125 2024-09-22 13:23:51,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=24201.333333333332, ans=0.0 2024-09-22 13:24:08,734 INFO [train.py:1198] (3/4) Epoch 2, batch 1300, loss[loss=0.3381, ctc_loss=0.2588, cr_loss=0.3963, over 16295.00 frames. ], tot_loss[loss=0.3666, ctc_loss=0.2819, cr_loss=0.4237, over 3355183.88 frames. ], batch size: 36, lr: 4.00e-02, grad_scale: 32.0 2024-09-22 13:24:13,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=24248.0, ans=0.07 2024-09-22 13:24:17,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=24248.0, ans=0.09899494936611666 2024-09-22 13:24:19,620 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=12.0 2024-09-22 13:24:23,423 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.394e+02 1.885e+02 2.250e+02 3.074e+02 5.750e+02, threshold=4.500e+02, percent-clipped=7.0 2024-09-22 13:24:39,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=24341.333333333332, ans=0.125 2024-09-22 13:24:41,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=24341.333333333332, ans=0.125 2024-09-22 13:25:05,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=24388.0, ans=0.125 2024-09-22 13:25:08,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=24388.0, ans=0.2 2024-09-22 13:25:20,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=24434.666666666668, ans=0.125 2024-09-22 13:25:28,593 INFO [train.py:1198] (3/4) Epoch 2, batch 1350, loss[loss=0.3876, ctc_loss=0.298, cr_loss=0.4482, over 17029.00 frames. ], tot_loss[loss=0.366, ctc_loss=0.2813, cr_loss=0.4232, over 3352469.59 frames. ], batch size: 44, lr: 3.99e-02, grad_scale: 32.0 2024-09-22 13:25:37,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=24481.333333333332, ans=12.0 2024-09-22 13:25:43,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=24528.0, ans=10.0 2024-09-22 13:26:44,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=24668.0, ans=0.015 2024-09-22 13:26:46,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=24668.0, ans=0.025 2024-09-22 13:26:48,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=24668.0, ans=0.125 2024-09-22 13:26:54,051 INFO [train.py:1198] (3/4) Epoch 2, batch 1400, loss[loss=0.3763, ctc_loss=0.2873, cr_loss=0.4447, over 16741.00 frames. ], tot_loss[loss=0.3646, ctc_loss=0.2801, cr_loss=0.4227, over 3356618.83 frames. ], batch size: 61, lr: 3.99e-02, grad_scale: 32.0 2024-09-22 13:27:00,145 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=6.0 2024-09-22 13:27:06,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=24714.666666666668, ans=0.125 2024-09-22 13:27:11,157 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.488e+02 2.006e+02 2.479e+02 3.166e+02 4.715e+02, threshold=4.958e+02, percent-clipped=3.0 2024-09-22 13:27:56,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=24854.666666666668, ans=0.125 2024-09-22 13:28:05,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=12.0 2024-09-22 13:28:16,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=24901.333333333332, ans=0.125 2024-09-22 13:28:19,235 INFO [train.py:1198] (3/4) Epoch 2, batch 1450, loss[loss=0.3221, ctc_loss=0.2424, cr_loss=0.3983, over 17263.00 frames. ], tot_loss[loss=0.3644, ctc_loss=0.2798, cr_loss=0.4232, over 3359012.17 frames. ], batch size: 42, lr: 3.98e-02, grad_scale: 32.0 2024-09-22 13:28:46,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=24994.666666666668, ans=0.0 2024-09-22 13:29:18,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=25088.0, ans=0.125 2024-09-22 13:29:38,919 INFO [train.py:1198] (3/4) Epoch 2, batch 1500, loss[loss=0.3855, ctc_loss=0.2969, cr_loss=0.4432, over 17012.00 frames. ], tot_loss[loss=0.3649, ctc_loss=0.2803, cr_loss=0.4227, over 3343184.42 frames. ], batch size: 53, lr: 3.98e-02, grad_scale: 32.0 2024-09-22 13:29:42,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=25181.333333333332, ans=0.09899494936611666 2024-09-22 13:29:53,537 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.354e+02 1.701e+02 2.138e+02 2.969e+02 5.127e+02, threshold=4.276e+02, percent-clipped=1.0 2024-09-22 13:30:01,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=25228.0, ans=0.125 2024-09-22 13:30:17,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=25274.666666666668, ans=0.125 2024-09-22 13:30:17,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=25274.666666666668, ans=0.2 2024-09-22 13:30:17,773 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 13:30:27,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=25321.333333333332, ans=0.04949747468305833 2024-09-22 13:30:41,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=25368.0, ans=0.1 2024-09-22 13:31:03,516 INFO [train.py:1198] (3/4) Epoch 2, batch 1550, loss[loss=0.3592, ctc_loss=0.2828, cr_loss=0.3822, over 17019.00 frames. ], tot_loss[loss=0.365, ctc_loss=0.2802, cr_loss=0.4241, over 3348773.14 frames. ], batch size: 44, lr: 3.97e-02, grad_scale: 32.0 2024-09-22 13:31:19,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=25461.333333333332, ans=0.025 2024-09-22 13:31:37,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=25508.0, ans=0.125 2024-09-22 13:31:48,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.20 vs. limit=10.0 2024-09-22 13:32:25,138 INFO [train.py:1198] (3/4) Epoch 2, batch 1600, loss[loss=0.3559, ctc_loss=0.2735, cr_loss=0.412, over 17066.00 frames. ], tot_loss[loss=0.3659, ctc_loss=0.2809, cr_loss=0.4249, over 3348065.49 frames. ], batch size: 46, lr: 3.96e-02, grad_scale: 32.0 2024-09-22 13:32:38,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=25648.0, ans=0.125 2024-09-22 13:32:39,458 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.451e+02 1.757e+02 1.985e+02 2.306e+02 3.282e+02, threshold=3.970e+02, percent-clipped=0.0 2024-09-22 13:33:47,219 INFO [train.py:1198] (3/4) Epoch 2, batch 1650, loss[loss=0.3358, ctc_loss=0.25, cr_loss=0.4288, over 16889.00 frames. ], tot_loss[loss=0.3649, ctc_loss=0.2801, cr_loss=0.4239, over 3344713.65 frames. ], batch size: 58, lr: 3.96e-02, grad_scale: 32.0 2024-09-22 13:34:00,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=25881.333333333332, ans=0.125 2024-09-22 13:34:27,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=25974.666666666668, ans=0.125 2024-09-22 13:34:50,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=26068.0, ans=0.125 2024-09-22 13:35:04,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=26068.0, ans=0.2 2024-09-22 13:35:07,609 INFO [train.py:1198] (3/4) Epoch 2, batch 1700, loss[loss=0.3947, ctc_loss=0.3005, cr_loss=0.4708, over 16996.00 frames. ], tot_loss[loss=0.3646, ctc_loss=0.2799, cr_loss=0.4234, over 3351194.69 frames. ], batch size: 53, lr: 3.95e-02, grad_scale: 32.0 2024-09-22 13:35:09,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=26114.666666666668, ans=0.005192463768115942 2024-09-22 13:35:22,186 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.426e+02 1.761e+02 2.144e+02 2.600e+02 4.141e+02, threshold=4.288e+02, percent-clipped=2.0 2024-09-22 13:35:24,382 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2024-09-22 13:35:30,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=26161.333333333332, ans=10.0 2024-09-22 13:36:33,104 INFO [train.py:1198] (3/4) Epoch 2, batch 1750, loss[loss=0.3352, ctc_loss=0.2596, cr_loss=0.378, over 17350.00 frames. ], tot_loss[loss=0.363, ctc_loss=0.2787, cr_loss=0.4213, over 3341350.89 frames. ], batch size: 48, lr: 3.94e-02, grad_scale: 32.0 2024-09-22 13:36:54,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=26394.666666666668, ans=0.125 2024-09-22 13:37:23,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=26488.0, ans=0.0 2024-09-22 13:37:23,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=26488.0, ans=0.0 2024-09-22 13:37:46,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.29 vs. limit=15.0 2024-09-22 13:37:56,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=26581.333333333332, ans=0.005091014492753623 2024-09-22 13:37:57,681 INFO [train.py:1198] (3/4) Epoch 2, batch 1800, loss[loss=0.3586, ctc_loss=0.2762, cr_loss=0.4119, over 16203.00 frames. ], tot_loss[loss=0.3638, ctc_loss=0.2793, cr_loss=0.4228, over 3349120.28 frames. ], batch size: 75, lr: 3.94e-02, grad_scale: 32.0 2024-09-22 13:38:11,941 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.435e+02 1.770e+02 2.078e+02 2.651e+02 4.856e+02, threshold=4.156e+02, percent-clipped=2.0 2024-09-22 13:38:15,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=26628.0, ans=0.0 2024-09-22 13:38:23,640 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 13:38:26,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=26628.0, ans=0.025 2024-09-22 13:38:34,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=26674.666666666668, ans=0.125 2024-09-22 13:38:59,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=26768.0, ans=0.125 2024-09-22 13:39:05,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=26768.0, ans=0.125 2024-09-22 13:39:17,329 INFO [train.py:1198] (3/4) Epoch 2, batch 1850, loss[loss=0.3918, ctc_loss=0.3038, cr_loss=0.4403, over 17058.00 frames. ], tot_loss[loss=0.3625, ctc_loss=0.2781, cr_loss=0.4217, over 3352044.04 frames. ], batch size: 52, lr: 3.93e-02, grad_scale: 32.0 2024-09-22 13:39:17,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=26814.666666666668, ans=0.005040289855072464 2024-09-22 13:39:21,371 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.40 vs. limit=22.5 2024-09-22 13:39:36,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=26861.333333333332, ans=0.125 2024-09-22 13:39:43,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=26861.333333333332, ans=0.125 2024-09-22 13:39:44,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=26861.333333333332, ans=0.125 2024-09-22 13:39:49,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=26908.0, ans=0.125 2024-09-22 13:39:57,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=26908.0, ans=0.025 2024-09-22 13:40:10,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=26954.666666666668, ans=0.005009855072463768 2024-09-22 13:40:12,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=26954.666666666668, ans=0.0 2024-09-22 13:40:39,946 INFO [train.py:1198] (3/4) Epoch 2, batch 1900, loss[loss=0.3228, ctc_loss=0.2457, cr_loss=0.3856, over 16310.00 frames. ], tot_loss[loss=0.3612, ctc_loss=0.2771, cr_loss=0.4205, over 3349367.59 frames. ], batch size: 36, lr: 3.92e-02, grad_scale: 32.0 2024-09-22 13:40:51,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=27048.0, ans=0.1 2024-09-22 13:40:57,181 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.334e+02 1.864e+02 2.204e+02 2.855e+02 4.990e+02, threshold=4.407e+02, percent-clipped=2.0 2024-09-22 13:41:04,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=27094.666666666668, ans=0.0 2024-09-22 13:41:07,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=27094.666666666668, ans=0.125 2024-09-22 13:41:15,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=27141.333333333332, ans=0.125 2024-09-22 13:41:30,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=27188.0, ans=0.004959130434782609 2024-09-22 13:41:34,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=27188.0, ans=0.0 2024-09-22 13:42:05,867 INFO [train.py:1198] (3/4) Epoch 2, batch 1950, loss[loss=0.3606, ctc_loss=0.2692, cr_loss=0.4569, over 17189.00 frames. ], tot_loss[loss=0.3626, ctc_loss=0.2783, cr_loss=0.4216, over 3342310.44 frames. ], batch size: 41, lr: 3.92e-02, grad_scale: 32.0 2024-09-22 13:42:25,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=27328.0, ans=0.1 2024-09-22 13:42:25,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2024-09-22 13:42:34,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=27328.0, ans=0.0 2024-09-22 13:42:36,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=27374.666666666668, ans=0.0 2024-09-22 13:42:50,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=27374.666666666668, ans=0.125 2024-09-22 13:43:10,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=27468.0, ans=0.0 2024-09-22 13:43:21,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=27468.0, ans=0.125 2024-09-22 13:43:27,962 INFO [train.py:1198] (3/4) Epoch 2, batch 2000, loss[loss=0.382, ctc_loss=0.2947, cr_loss=0.4366, over 17313.00 frames. ], tot_loss[loss=0.3627, ctc_loss=0.2782, cr_loss=0.4227, over 3342892.15 frames. ], batch size: 51, lr: 3.91e-02, grad_scale: 32.0 2024-09-22 13:43:42,255 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.401e+02 1.881e+02 2.197e+02 2.842e+02 5.136e+02, threshold=4.393e+02, percent-clipped=2.0 2024-09-22 13:43:54,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.37 vs. limit=15.0 2024-09-22 13:44:04,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=27608.0, ans=0.0 2024-09-22 13:44:40,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=27701.333333333332, ans=0.95 2024-09-22 13:44:42,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.17 vs. limit=15.0 2024-09-22 13:44:47,947 INFO [train.py:1198] (3/4) Epoch 2, batch 2050, loss[loss=0.3775, ctc_loss=0.2871, cr_loss=0.4515, over 17032.00 frames. ], tot_loss[loss=0.3607, ctc_loss=0.2765, cr_loss=0.421, over 3348587.31 frames. ], batch size: 52, lr: 3.91e-02, grad_scale: 32.0 2024-09-22 13:45:07,544 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 13:45:18,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=27841.333333333332, ans=0.0 2024-09-22 13:45:45,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=27888.0, ans=0.1 2024-09-22 13:45:56,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.04 vs. limit=10.0 2024-09-22 13:45:57,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=27934.666666666668, ans=0.025 2024-09-22 13:46:04,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-09-22 13:46:13,305 INFO [train.py:1198] (3/4) Epoch 2, batch 2100, loss[loss=0.3933, ctc_loss=0.2966, cr_loss=0.4837, over 17143.00 frames. ], tot_loss[loss=0.3628, ctc_loss=0.2782, cr_loss=0.423, over 3338516.40 frames. ], batch size: 48, lr: 3.90e-02, grad_scale: 32.0 2024-09-22 13:46:27,944 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.428e+02 1.746e+02 2.180e+02 2.623e+02 4.533e+02, threshold=4.360e+02, percent-clipped=2.0 2024-09-22 13:46:36,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=28028.0, ans=0.125 2024-09-22 13:46:52,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=28074.666666666668, ans=0.004766376811594203 2024-09-22 13:47:06,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=28121.333333333332, ans=0.125 2024-09-22 13:47:19,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=12.0 2024-09-22 13:47:28,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=28168.0, ans=0.125 2024-09-22 13:47:36,049 INFO [train.py:1198] (3/4) Epoch 2, batch 2150, loss[loss=0.3845, ctc_loss=0.2973, cr_loss=0.436, over 16509.00 frames. ], tot_loss[loss=0.3609, ctc_loss=0.2766, cr_loss=0.4213, over 3343518.16 frames. ], batch size: 66, lr: 3.89e-02, grad_scale: 32.0 2024-09-22 13:47:44,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=28214.666666666668, ans=0.125 2024-09-22 13:47:46,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.32 vs. limit=15.0 2024-09-22 13:47:56,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=28261.333333333332, ans=0.125 2024-09-22 13:47:56,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=28261.333333333332, ans=0.0 2024-09-22 13:48:09,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=28308.0, ans=0.09899494936611666 2024-09-22 13:48:09,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=28308.0, ans=0.125 2024-09-22 13:48:36,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=28354.666666666668, ans=15.0 2024-09-22 13:48:57,332 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 13:48:58,606 INFO [train.py:1198] (3/4) Epoch 2, batch 2200, loss[loss=0.38, ctc_loss=0.2873, cr_loss=0.4639, over 17130.00 frames. ], tot_loss[loss=0.3618, ctc_loss=0.2773, cr_loss=0.4224, over 3349012.96 frames. ], batch size: 48, lr: 3.89e-02, grad_scale: 32.0 2024-09-22 13:49:01,156 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2024-09-22 13:49:05,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=28448.0, ans=0.125 2024-09-22 13:49:12,813 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.310e+02 1.723e+02 2.085e+02 2.498e+02 4.255e+02, threshold=4.169e+02, percent-clipped=0.0 2024-09-22 13:49:57,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28588.0, ans=0.1 2024-09-22 13:50:18,358 INFO [train.py:1198] (3/4) Epoch 2, batch 2250, loss[loss=0.3475, ctc_loss=0.2592, cr_loss=0.4416, over 17209.00 frames. ], tot_loss[loss=0.3602, ctc_loss=0.2759, cr_loss=0.4216, over 3350882.38 frames. ], batch size: 50, lr: 3.88e-02, grad_scale: 32.0 2024-09-22 13:50:32,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=28681.333333333332, ans=0.0 2024-09-22 13:51:02,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=28774.666666666668, ans=0.125 2024-09-22 13:51:16,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=28821.333333333332, ans=0.0 2024-09-22 13:51:28,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.76 vs. limit=6.0 2024-09-22 13:51:31,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-09-22 13:51:37,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=12.0 2024-09-22 13:51:43,312 INFO [train.py:1198] (3/4) Epoch 2, batch 2300, loss[loss=0.29, ctc_loss=0.213, cr_loss=0.3851, over 16786.00 frames. ], tot_loss[loss=0.3585, ctc_loss=0.2744, cr_loss=0.4205, over 3359247.36 frames. ], batch size: 37, lr: 3.87e-02, grad_scale: 32.0 2024-09-22 13:51:53,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=28914.666666666668, ans=0.125 2024-09-22 13:51:59,761 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.37 vs. limit=22.5 2024-09-22 13:52:00,470 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.412e+02 1.812e+02 2.268e+02 2.899e+02 4.767e+02, threshold=4.537e+02, percent-clipped=4.0 2024-09-22 13:52:13,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=28961.333333333332, ans=0.004573623188405798 2024-09-22 13:52:16,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=29008.0, ans=0.2 2024-09-22 13:52:20,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=29008.0, ans=0.125 2024-09-22 13:52:23,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=29008.0, ans=0.004563478260869565 2024-09-22 13:52:31,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=29008.0, ans=0.125 2024-09-22 13:53:08,676 INFO [train.py:1198] (3/4) Epoch 2, batch 2350, loss[loss=0.3158, ctc_loss=0.2442, cr_loss=0.3583, over 17267.00 frames. ], tot_loss[loss=0.3587, ctc_loss=0.2745, cr_loss=0.4209, over 3356423.30 frames. ], batch size: 42, lr: 3.87e-02, grad_scale: 32.0 2024-09-22 13:53:20,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2024-09-22 13:53:31,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=29194.666666666668, ans=0.125 2024-09-22 13:53:38,245 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2024-09-22 13:54:27,695 INFO [train.py:1198] (3/4) Epoch 2, batch 2400, loss[loss=0.3837, ctc_loss=0.2918, cr_loss=0.4596, over 17108.00 frames. ], tot_loss[loss=0.3566, ctc_loss=0.2727, cr_loss=0.4195, over 3363181.18 frames. ], batch size: 49, lr: 3.86e-02, grad_scale: 32.0 2024-09-22 13:54:28,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=29381.333333333332, ans=0.1 2024-09-22 13:54:41,835 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.409e+02 1.714e+02 1.995e+02 2.694e+02 4.976e+02, threshold=3.990e+02, percent-clipped=1.0 2024-09-22 13:54:51,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=12.0 2024-09-22 13:55:03,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=29474.666666666668, ans=0.0 2024-09-22 13:55:04,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=29474.666666666668, ans=0.0 2024-09-22 13:55:52,230 INFO [train.py:1198] (3/4) Epoch 2, batch 2450, loss[loss=0.3219, ctc_loss=0.2442, cr_loss=0.3889, over 17257.00 frames. ], tot_loss[loss=0.3575, ctc_loss=0.2733, cr_loss=0.4213, over 3365659.48 frames. ], batch size: 44, lr: 3.86e-02, grad_scale: 32.0 2024-09-22 13:56:00,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=29614.666666666668, ans=0.0 2024-09-22 13:56:00,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=29614.666666666668, ans=0.0 2024-09-22 13:56:05,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.02 vs. limit=15.0 2024-09-22 13:56:26,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=29708.0, ans=0.125 2024-09-22 13:56:56,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2024-09-22 13:57:14,736 INFO [train.py:1198] (3/4) Epoch 2, batch 2500, loss[loss=0.3882, ctc_loss=0.2953, cr_loss=0.4643, over 17009.00 frames. ], tot_loss[loss=0.3583, ctc_loss=0.2739, cr_loss=0.422, over 3363787.05 frames. ], batch size: 53, lr: 3.85e-02, grad_scale: 32.0 2024-09-22 13:57:26,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=29848.0, ans=0.125 2024-09-22 13:57:29,034 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.436e+02 1.845e+02 2.309e+02 3.069e+02 4.385e+02, threshold=4.618e+02, percent-clipped=5.0 2024-09-22 13:57:29,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=29894.666666666668, ans=0.025 2024-09-22 13:58:00,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=29941.333333333332, ans=0.125 2024-09-22 13:58:16,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=29988.0, ans=0.125 2024-09-22 13:58:36,484 INFO [train.py:1198] (3/4) Epoch 2, batch 2550, loss[loss=0.3473, ctc_loss=0.2588, cr_loss=0.4425, over 17069.00 frames. ], tot_loss[loss=0.3602, ctc_loss=0.2755, cr_loss=0.4239, over 3360375.68 frames. ], batch size: 46, lr: 3.84e-02, grad_scale: 32.0 2024-09-22 13:58:47,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=30081.333333333332, ans=0.125 2024-09-22 13:58:47,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=30081.333333333332, ans=0.125 2024-09-22 13:59:07,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.68 vs. limit=22.5 2024-09-22 13:59:08,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=30174.666666666668, ans=0.004309855072463768 2024-09-22 13:59:11,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=30174.666666666668, ans=0.125 2024-09-22 13:59:13,569 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 13:59:27,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=12.0 2024-09-22 13:59:55,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.70 vs. limit=15.0 2024-09-22 13:59:56,516 INFO [train.py:1198] (3/4) Epoch 2, batch 2600, loss[loss=0.3416, ctc_loss=0.2592, cr_loss=0.412, over 17222.00 frames. ], tot_loss[loss=0.3594, ctc_loss=0.2746, cr_loss=0.4239, over 3363135.30 frames. ], batch size: 47, lr: 3.84e-02, grad_scale: 32.0 2024-09-22 14:00:04,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=30314.666666666668, ans=0.125 2024-09-22 14:00:11,074 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.285e+02 1.903e+02 2.207e+02 2.848e+02 4.508e+02, threshold=4.414e+02, percent-clipped=0.0 2024-09-22 14:00:45,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=30408.0, ans=0.0 2024-09-22 14:01:09,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=30501.333333333332, ans=0.2 2024-09-22 14:01:10,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=30501.333333333332, ans=0.09899494936611666 2024-09-22 14:01:21,838 INFO [train.py:1198] (3/4) Epoch 2, batch 2650, loss[loss=0.3356, ctc_loss=0.2544, cr_loss=0.4059, over 17296.00 frames. ], tot_loss[loss=0.358, ctc_loss=0.2735, cr_loss=0.4228, over 3361559.81 frames. ], batch size: 46, lr: 3.83e-02, grad_scale: 32.0 2024-09-22 14:01:26,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=30548.0, ans=0.125 2024-09-22 14:01:28,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=30548.0, ans=0.125 2024-09-22 14:01:33,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2024-09-22 14:01:42,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=30594.666666666668, ans=0.07 2024-09-22 14:01:55,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=30641.333333333332, ans=0.00420840579710145 2024-09-22 14:01:56,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=30641.333333333332, ans=0.125 2024-09-22 14:01:58,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=30641.333333333332, ans=0.125 2024-09-22 14:02:41,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2024-09-22 14:02:45,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.50 vs. limit=22.5 2024-09-22 14:02:46,376 INFO [train.py:1198] (3/4) Epoch 2, batch 2700, loss[loss=0.4083, ctc_loss=0.3136, cr_loss=0.4736, over 15102.00 frames. ], tot_loss[loss=0.3566, ctc_loss=0.2722, cr_loss=0.4219, over 3366061.23 frames. ], batch size: 89, lr: 3.82e-02, grad_scale: 32.0 2024-09-22 14:03:00,820 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.428e+02 1.814e+02 2.097e+02 2.446e+02 4.164e+02, threshold=4.194e+02, percent-clipped=0.0 2024-09-22 14:03:04,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=30828.0, ans=0.125 2024-09-22 14:03:05,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=30828.0, ans=0.004167826086956521 2024-09-22 14:03:17,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=30874.666666666668, ans=0.2 2024-09-22 14:03:47,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=30921.333333333332, ans=0.035 2024-09-22 14:04:05,951 INFO [train.py:1198] (3/4) Epoch 2, batch 2750, loss[loss=0.3649, ctc_loss=0.2788, cr_loss=0.4306, over 17118.00 frames. ], tot_loss[loss=0.3574, ctc_loss=0.2729, cr_loss=0.4226, over 3359037.25 frames. ], batch size: 49, lr: 3.82e-02, grad_scale: 32.0 2024-09-22 14:04:06,172 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 14:04:14,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.18 vs. limit=22.5 2024-09-22 14:04:20,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2024-09-22 14:04:46,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2024-09-22 14:05:12,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=31201.333333333332, ans=0.025 2024-09-22 14:05:31,360 INFO [train.py:1198] (3/4) Epoch 2, batch 2800, loss[loss=0.3164, ctc_loss=0.2392, cr_loss=0.3858, over 17035.00 frames. ], tot_loss[loss=0.3572, ctc_loss=0.2726, cr_loss=0.4231, over 3354668.09 frames. ], batch size: 39, lr: 3.81e-02, grad_scale: 32.0 2024-09-22 14:05:39,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=31248.0, ans=0.004076521739130434 2024-09-22 14:05:42,624 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 14:05:45,623 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.464e+02 1.818e+02 2.147e+02 2.657e+02 4.230e+02, threshold=4.294e+02, percent-clipped=1.0 2024-09-22 14:06:12,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.07 vs. limit=15.0 2024-09-22 14:06:39,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.86 vs. limit=22.5 2024-09-22 14:06:48,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2024-09-22 14:06:53,504 INFO [train.py:1198] (3/4) Epoch 2, batch 2850, loss[loss=0.3794, ctc_loss=0.288, cr_loss=0.457, over 16678.00 frames. ], tot_loss[loss=0.3575, ctc_loss=0.2728, cr_loss=0.4236, over 3355947.24 frames. ], batch size: 61, lr: 3.80e-02, grad_scale: 32.0 2024-09-22 14:07:03,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.59 vs. limit=10.0 2024-09-22 14:07:18,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=31528.0, ans=0.1 2024-09-22 14:07:22,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2024-09-22 14:07:35,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2024-09-22 14:08:12,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=31668.0, ans=0.125 2024-09-22 14:08:15,443 INFO [train.py:1198] (3/4) Epoch 2, batch 2900, loss[loss=0.316, ctc_loss=0.2354, cr_loss=0.403, over 16964.00 frames. ], tot_loss[loss=0.3549, ctc_loss=0.2707, cr_loss=0.4208, over 3356039.42 frames. ], batch size: 42, lr: 3.80e-02, grad_scale: 32.0 2024-09-22 14:08:28,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=31714.666666666668, ans=0.0 2024-09-22 14:08:29,968 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.349e+02 1.781e+02 2.101e+02 2.670e+02 4.501e+02, threshold=4.202e+02, percent-clipped=1.0 2024-09-22 14:08:44,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=31761.333333333332, ans=0.0 2024-09-22 14:08:52,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=31808.0, ans=0.0039547826086956525 2024-09-22 14:08:59,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=31808.0, ans=0.95 2024-09-22 14:09:05,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=31854.666666666668, ans=0.0 2024-09-22 14:09:35,328 INFO [train.py:1198] (3/4) Epoch 2, batch 2950, loss[loss=0.3325, ctc_loss=0.2471, cr_loss=0.427, over 17056.00 frames. ], tot_loss[loss=0.3562, ctc_loss=0.2717, cr_loss=0.4224, over 3360401.84 frames. ], batch size: 39, lr: 3.79e-02, grad_scale: 32.0 2024-09-22 14:09:35,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=31948.0, ans=0.0039243478260869566 2024-09-22 14:09:48,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.60 vs. limit=12.0 2024-09-22 14:09:48,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2024-09-22 14:10:02,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31994.666666666668, ans=0.1 2024-09-22 14:10:14,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=32041.333333333332, ans=0.125 2024-09-22 14:10:34,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=32088.0, ans=0.125 2024-09-22 14:10:59,599 INFO [train.py:1198] (3/4) Epoch 2, batch 3000, loss[loss=0.3502, ctc_loss=0.2581, cr_loss=0.4608, over 17020.00 frames. ], tot_loss[loss=0.3554, ctc_loss=0.271, cr_loss=0.4219, over 3347656.26 frames. ], batch size: 44, lr: 3.79e-02, grad_scale: 32.0 2024-09-22 14:10:59,599 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 14:11:13,192 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6497, 4.1966, 4.3138, 4.1001], device='cuda:3') 2024-09-22 14:11:15,218 INFO [train.py:1230] (3/4) Epoch 2, validation: loss=0.0967, ctc_loss=0.0967, cr_loss=8.169e-15, over 944034.00 frames. 2024-09-22 14:11:15,219 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-22 14:11:29,192 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.469e+02 1.814e+02 2.374e+02 2.965e+02 6.190e+02, threshold=4.748e+02, percent-clipped=4.0 2024-09-22 14:11:31,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=32228.0, ans=0.025 2024-09-22 14:11:34,619 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2024-09-22 14:11:38,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=32228.0, ans=0.125 2024-09-22 14:11:38,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=32228.0, ans=0.05 2024-09-22 14:11:43,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=32228.0, ans=0.125 2024-09-22 14:11:44,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=32274.666666666668, ans=0.0 2024-09-22 14:11:52,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=32274.666666666668, ans=0.09899494936611666 2024-09-22 14:11:54,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.15 vs. limit=10.0 2024-09-22 14:12:26,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-09-22 14:12:35,206 INFO [train.py:1198] (3/4) Epoch 2, batch 3050, loss[loss=0.3905, ctc_loss=0.3036, cr_loss=0.4349, over 17102.00 frames. ], tot_loss[loss=0.3538, ctc_loss=0.2697, cr_loss=0.4201, over 3353692.56 frames. ], batch size: 49, lr: 3.78e-02, grad_scale: 32.0 2024-09-22 14:12:57,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=32461.333333333332, ans=0.125 2024-09-22 14:13:04,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=32508.0, ans=0.1 2024-09-22 14:13:42,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=32601.333333333332, ans=0.125 2024-09-22 14:13:52,647 INFO [train.py:1198] (3/4) Epoch 2, batch 3100, loss[loss=0.3991, ctc_loss=0.3051, cr_loss=0.4703, over 16902.00 frames. ], tot_loss[loss=0.3553, ctc_loss=0.2708, cr_loss=0.4224, over 3363928.00 frames. ], batch size: 58, lr: 3.77e-02, grad_scale: 32.0 2024-09-22 14:13:54,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=32648.0, ans=0.0 2024-09-22 14:13:56,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=32648.0, ans=0.0 2024-09-22 14:14:10,889 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.354e+02 1.798e+02 2.252e+02 2.859e+02 4.646e+02, threshold=4.504e+02, percent-clipped=0.0 2024-09-22 14:14:42,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=32788.0, ans=0.025 2024-09-22 14:14:57,709 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 14:15:06,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=32834.666666666664, ans=0.1 2024-09-22 14:15:12,721 INFO [train.py:1198] (3/4) Epoch 2, batch 3150, loss[loss=0.3534, ctc_loss=0.2724, cr_loss=0.4047, over 17307.00 frames. ], tot_loss[loss=0.353, ctc_loss=0.2689, cr_loss=0.4203, over 3358646.94 frames. ], batch size: 46, lr: 3.77e-02, grad_scale: 32.0 2024-09-22 14:15:19,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=32881.333333333336, ans=0.125 2024-09-22 14:15:24,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2024-09-22 14:15:26,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.68 vs. limit=22.5 2024-09-22 14:16:10,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=33021.333333333336, ans=0.125 2024-09-22 14:16:18,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=33068.0, ans=0.025 2024-09-22 14:16:26,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2024-09-22 14:16:30,722 INFO [train.py:1198] (3/4) Epoch 2, batch 3200, loss[loss=0.3121, ctc_loss=0.2363, cr_loss=0.3788, over 17216.00 frames. ], tot_loss[loss=0.3517, ctc_loss=0.2678, cr_loss=0.4199, over 3359722.42 frames. ], batch size: 41, lr: 3.76e-02, grad_scale: 32.0 2024-09-22 14:16:33,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.40 vs. limit=10.0 2024-09-22 14:16:46,457 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.436e+02 1.721e+02 1.992e+02 2.305e+02 3.966e+02, threshold=3.983e+02, percent-clipped=0.0 2024-09-22 14:16:52,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=33161.333333333336, ans=0.125 2024-09-22 14:17:02,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=22.5 2024-09-22 14:17:07,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=33208.0, ans=0.125 2024-09-22 14:17:11,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=33208.0, ans=0.125 2024-09-22 14:17:14,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=33208.0, ans=0.003650434782608696 2024-09-22 14:17:31,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=33301.333333333336, ans=0.003630144927536231 2024-09-22 14:17:45,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=33301.333333333336, ans=0.125 2024-09-22 14:17:46,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=12.0 2024-09-22 14:17:48,522 INFO [train.py:1198] (3/4) Epoch 2, batch 3250, loss[loss=0.348, ctc_loss=0.2564, cr_loss=0.4578, over 17305.00 frames. ], tot_loss[loss=0.3528, ctc_loss=0.2686, cr_loss=0.4209, over 3349947.29 frames. ], batch size: 49, lr: 3.75e-02, grad_scale: 32.0 2024-09-22 14:17:56,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=33348.0, ans=0.025 2024-09-22 14:18:01,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=33348.0, ans=0.00362 2024-09-22 14:18:04,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=33394.666666666664, ans=0.2 2024-09-22 14:18:18,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.75 vs. limit=6.0 2024-09-22 14:18:45,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.71 vs. limit=15.0 2024-09-22 14:18:50,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=33534.666666666664, ans=0.0 2024-09-22 14:18:57,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2024-09-22 14:19:05,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=33581.333333333336, ans=0.1 2024-09-22 14:19:06,345 INFO [train.py:1198] (3/4) Epoch 2, batch 3300, loss[loss=0.3778, ctc_loss=0.29, cr_loss=0.4391, over 17047.00 frames. ], tot_loss[loss=0.354, ctc_loss=0.2697, cr_loss=0.4214, over 3341869.38 frames. ], batch size: 56, lr: 3.75e-02, grad_scale: 32.0 2024-09-22 14:19:09,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=33581.333333333336, ans=0.0 2024-09-22 14:19:22,039 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.353e+02 1.852e+02 2.253e+02 3.068e+02 5.078e+02, threshold=4.507e+02, percent-clipped=5.0 2024-09-22 14:19:53,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=33721.333333333336, ans=0.125 2024-09-22 14:20:14,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=33768.0, ans=0.2 2024-09-22 14:20:20,959 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 14:20:24,089 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 14:20:26,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2024-09-22 14:20:28,467 INFO [train.py:1198] (3/4) Epoch 2, batch 3350, loss[loss=0.3384, ctc_loss=0.2561, cr_loss=0.4113, over 17141.00 frames. ], tot_loss[loss=0.3524, ctc_loss=0.2684, cr_loss=0.4201, over 3350698.83 frames. ], batch size: 45, lr: 3.74e-02, grad_scale: 32.0 2024-09-22 14:20:48,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=33861.333333333336, ans=0.2 2024-09-22 14:20:59,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=33908.0, ans=0.1 2024-09-22 14:21:17,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=12.0 2024-09-22 14:21:31,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=34001.333333333336, ans=0.125 2024-09-22 14:21:46,563 INFO [train.py:1198] (3/4) Epoch 2, batch 3400, loss[loss=0.3767, ctc_loss=0.2947, cr_loss=0.4101, over 16924.00 frames. ], tot_loss[loss=0.3522, ctc_loss=0.2681, cr_loss=0.4205, over 3356775.41 frames. ], batch size: 58, lr: 3.74e-02, grad_scale: 32.0 2024-09-22 14:21:47,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.12 vs. limit=12.0 2024-09-22 14:21:48,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=34048.0, ans=0.125 2024-09-22 14:21:51,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=34048.0, ans=0.2 2024-09-22 14:21:53,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=34048.0, ans=0.1 2024-09-22 14:22:02,234 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.402e+02 1.761e+02 2.090e+02 2.503e+02 3.941e+02, threshold=4.179e+02, percent-clipped=0.0 2024-09-22 14:22:23,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=34141.333333333336, ans=0.125 2024-09-22 14:22:26,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=34141.333333333336, ans=0.125 2024-09-22 14:22:36,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=34188.0, ans=0.1 2024-09-22 14:22:41,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=34188.0, ans=0.05 2024-09-22 14:23:03,520 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.51 vs. limit=15.0 2024-09-22 14:23:07,089 INFO [train.py:1198] (3/4) Epoch 2, batch 3450, loss[loss=0.3093, ctc_loss=0.2361, cr_loss=0.366, over 17034.00 frames. ], tot_loss[loss=0.3523, ctc_loss=0.2682, cr_loss=0.4206, over 3353667.49 frames. ], batch size: 39, lr: 3.73e-02, grad_scale: 16.0 2024-09-22 14:23:07,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=34281.333333333336, ans=0.07 2024-09-22 14:23:15,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=34281.333333333336, ans=0.2 2024-09-22 14:23:38,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=34374.666666666664, ans=0.125 2024-09-22 14:23:58,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.38 vs. limit=10.0 2024-09-22 14:23:58,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.86 vs. limit=15.0 2024-09-22 14:24:26,672 INFO [train.py:1198] (3/4) Epoch 2, batch 3500, loss[loss=0.3395, ctc_loss=0.2593, cr_loss=0.4011, over 17011.00 frames. ], tot_loss[loss=0.352, ctc_loss=0.2677, cr_loss=0.4216, over 3359331.67 frames. ], batch size: 39, lr: 3.72e-02, grad_scale: 16.0 2024-09-22 14:24:33,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=34514.666666666664, ans=0.125 2024-09-22 14:24:44,121 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.468e+02 1.790e+02 2.190e+02 2.610e+02 4.602e+02, threshold=4.381e+02, percent-clipped=3.0 2024-09-22 14:25:08,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.10 vs. limit=22.5 2024-09-22 14:25:25,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.39 vs. limit=22.5 2024-09-22 14:25:28,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=34701.333333333336, ans=0.2 2024-09-22 14:25:29,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=34701.333333333336, ans=0.5 2024-09-22 14:25:35,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=34701.333333333336, ans=0.125 2024-09-22 14:25:44,829 INFO [train.py:1198] (3/4) Epoch 2, batch 3550, loss[loss=0.3793, ctc_loss=0.2977, cr_loss=0.4082, over 15876.00 frames. ], tot_loss[loss=0.354, ctc_loss=0.2693, cr_loss=0.4237, over 3355048.27 frames. ], batch size: 74, lr: 3.72e-02, grad_scale: 16.0 2024-09-22 14:25:45,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=34748.0, ans=0.125 2024-09-22 14:25:50,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.86 vs. limit=15.0 2024-09-22 14:26:03,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=34794.666666666664, ans=0.125 2024-09-22 14:26:04,037 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2024-09-22 14:27:02,440 INFO [train.py:1198] (3/4) Epoch 2, batch 3600, loss[loss=0.3902, ctc_loss=0.305, cr_loss=0.4261, over 16923.00 frames. ], tot_loss[loss=0.3536, ctc_loss=0.2691, cr_loss=0.4221, over 3338621.37 frames. ], batch size: 58, lr: 3.71e-02, grad_scale: 32.0 2024-09-22 14:27:19,633 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.393e+02 1.714e+02 2.123e+02 2.581e+02 4.355e+02, threshold=4.245e+02, percent-clipped=0.0 2024-09-22 14:27:22,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.20 vs. limit=15.0 2024-09-22 14:27:37,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=35074.666666666664, ans=0.0 2024-09-22 14:27:47,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2024-09-22 14:28:20,466 INFO [train.py:1198] (3/4) Epoch 2, batch 3650, loss[loss=0.4211, ctc_loss=0.3328, cr_loss=0.4415, over 14981.00 frames. ], tot_loss[loss=0.3548, ctc_loss=0.2703, cr_loss=0.4222, over 3337299.39 frames. ], batch size: 89, lr: 3.70e-02, grad_scale: 32.0 2024-09-22 14:28:25,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=35214.666666666664, ans=0.125 2024-09-22 14:28:42,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=35261.333333333336, ans=0.125 2024-09-22 14:28:53,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=35308.0, ans=0.07 2024-09-22 14:29:01,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=35308.0, ans=0.07 2024-09-22 14:29:11,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=35354.666666666664, ans=6.0 2024-09-22 14:29:13,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=35354.666666666664, ans=0.2 2024-09-22 14:29:18,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=35354.666666666664, ans=0.025 2024-09-22 14:29:20,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=35354.666666666664, ans=0.07 2024-09-22 14:29:20,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=35354.666666666664, ans=0.07 2024-09-22 14:29:40,842 INFO [train.py:1198] (3/4) Epoch 2, batch 3700, loss[loss=0.3911, ctc_loss=0.3049, cr_loss=0.431, over 16562.00 frames. ], tot_loss[loss=0.3556, ctc_loss=0.2711, cr_loss=0.4226, over 3334944.97 frames. ], batch size: 66, lr: 3.70e-02, grad_scale: 16.0 2024-09-22 14:29:46,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.84 vs. limit=22.5 2024-09-22 14:29:59,940 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.354e+02 1.763e+02 2.146e+02 2.787e+02 4.998e+02, threshold=4.291e+02, percent-clipped=2.0 2024-09-22 14:30:17,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=35541.333333333336, ans=0.125 2024-09-22 14:30:33,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=35588.0, ans=0.2 2024-09-22 14:30:59,292 INFO [train.py:1198] (3/4) Epoch 2, batch 3750, loss[loss=0.3124, ctc_loss=0.2322, cr_loss=0.401, over 17173.00 frames. ], tot_loss[loss=0.3553, ctc_loss=0.2707, cr_loss=0.4231, over 3331500.10 frames. ], batch size: 45, lr: 3.69e-02, grad_scale: 16.0 2024-09-22 14:31:50,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=35821.333333333336, ans=10.0 2024-09-22 14:31:57,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.31 vs. limit=22.5 2024-09-22 14:32:03,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=35868.0, ans=0.025 2024-09-22 14:32:06,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=35868.0, ans=0.05 2024-09-22 14:32:16,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.60 vs. limit=6.0 2024-09-22 14:32:18,724 INFO [train.py:1198] (3/4) Epoch 2, batch 3800, loss[loss=0.3621, ctc_loss=0.2798, cr_loss=0.4115, over 17304.00 frames. ], tot_loss[loss=0.3559, ctc_loss=0.2711, cr_loss=0.4243, over 3331570.26 frames. ], batch size: 51, lr: 3.69e-02, grad_scale: 16.0 2024-09-22 14:32:18,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=35914.666666666664, ans=0.125 2024-09-22 14:32:25,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=35914.666666666664, ans=0.125 2024-09-22 14:32:37,260 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.475e+02 1.782e+02 2.185e+02 2.503e+02 5.708e+02, threshold=4.370e+02, percent-clipped=5.0 2024-09-22 14:32:38,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2024-09-22 14:32:55,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=36008.0, ans=0.025 2024-09-22 14:32:58,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=36008.0, ans=0.0 2024-09-22 14:33:32,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=36101.333333333336, ans=0.05 2024-09-22 14:33:35,548 INFO [train.py:1198] (3/4) Epoch 2, batch 3850, loss[loss=0.4253, ctc_loss=0.3426, cr_loss=0.4134, over 11534.00 frames. ], tot_loss[loss=0.3591, ctc_loss=0.2742, cr_loss=0.4246, over 3290225.53 frames. ], batch size: 123, lr: 3.68e-02, grad_scale: 16.0 2024-09-22 14:33:43,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=36148.0, ans=0.125 2024-09-22 14:34:05,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.82 vs. limit=22.5 2024-09-22 14:34:09,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=36241.333333333336, ans=0.125 2024-09-22 14:35:36,878 INFO [train.py:1198] (3/4) Epoch 3, batch 0, loss[loss=0.4057, ctc_loss=0.3158, cr_loss=0.4495, over 15987.00 frames. ], tot_loss[loss=0.4057, ctc_loss=0.3158, cr_loss=0.4495, over 15987.00 frames. ], batch size: 74, lr: 3.49e-02, grad_scale: 32.0 2024-09-22 14:35:36,878 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 14:35:52,232 INFO [train.py:1230] (3/4) Epoch 3, validation: loss=0.1002, ctc_loss=0.1002, cr_loss=7.948e-15, over 944034.00 frames. 2024-09-22 14:35:52,233 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-22 14:35:57,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.88 vs. limit=22.5 2024-09-22 14:36:04,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2024-09-22 14:36:13,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=36409.333333333336, ans=0.0 2024-09-22 14:36:15,360 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 14:36:20,867 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.466e+02 1.866e+02 2.182e+02 2.689e+02 4.735e+02, threshold=4.364e+02, percent-clipped=1.0 2024-09-22 14:36:21,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=36409.333333333336, ans=0.0029544927536231877 2024-09-22 14:36:24,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=36409.333333333336, ans=0.125 2024-09-22 14:36:38,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=36456.0, ans=0.09899494936611666 2024-09-22 14:36:43,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=36502.666666666664, ans=0.025 2024-09-22 14:37:11,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=36549.333333333336, ans=0.125 2024-09-22 14:37:17,947 INFO [train.py:1198] (3/4) Epoch 3, batch 50, loss[loss=0.3838, ctc_loss=0.2956, cr_loss=0.4412, over 16999.00 frames. ], tot_loss[loss=0.3579, ctc_loss=0.2727, cr_loss=0.4257, over 762055.82 frames. ], batch size: 53, lr: 3.49e-02, grad_scale: 32.0 2024-09-22 14:37:25,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=36596.0, ans=0.125 2024-09-22 14:37:28,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=36596.0, ans=0.0029139130434782615 2024-09-22 14:38:37,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.28 vs. limit=12.0 2024-09-22 14:38:41,849 INFO [train.py:1198] (3/4) Epoch 3, batch 100, loss[loss=0.331, ctc_loss=0.2453, cr_loss=0.4285, over 17306.00 frames. ], tot_loss[loss=0.3585, ctc_loss=0.2726, cr_loss=0.4294, over 1338823.06 frames. ], batch size: 46, lr: 3.48e-02, grad_scale: 32.0 2024-09-22 14:38:46,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=36829.333333333336, ans=0.125 2024-09-22 14:38:49,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=36829.333333333336, ans=0.125 2024-09-22 14:38:51,640 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 14:39:07,121 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.389e+02 1.767e+02 2.113e+02 2.664e+02 5.595e+02, threshold=4.227e+02, percent-clipped=3.0 2024-09-22 14:39:31,321 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 14:39:50,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=15.0 2024-09-22 14:40:01,182 INFO [train.py:1198] (3/4) Epoch 3, batch 150, loss[loss=0.4352, ctc_loss=0.3482, cr_loss=0.4349, over 11897.00 frames. ], tot_loss[loss=0.353, ctc_loss=0.2679, cr_loss=0.4252, over 1776963.54 frames. ], batch size: 123, lr: 3.47e-02, grad_scale: 32.0 2024-09-22 14:40:07,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=37062.666666666664, ans=0.0 2024-09-22 14:40:35,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=37156.0, ans=0.2 2024-09-22 14:41:10,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=37249.333333333336, ans=0.0 2024-09-22 14:41:23,510 INFO [train.py:1198] (3/4) Epoch 3, batch 200, loss[loss=0.3265, ctc_loss=0.2458, cr_loss=0.4036, over 17215.00 frames. ], tot_loss[loss=0.3503, ctc_loss=0.2657, cr_loss=0.4235, over 2127929.20 frames. ], batch size: 47, lr: 3.47e-02, grad_scale: 32.0 2024-09-22 14:41:34,826 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 14:41:48,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37342.666666666664, ans=0.1 2024-09-22 14:41:51,077 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.278e+02 1.679e+02 1.968e+02 2.454e+02 4.058e+02, threshold=3.935e+02, percent-clipped=0.0 2024-09-22 14:41:52,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2024-09-22 14:42:00,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=37389.333333333336, ans=0.125 2024-09-22 14:42:14,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=37436.0, ans=0.05 2024-09-22 14:42:16,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=37436.0, ans=0.002731304347826087 2024-09-22 14:42:32,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-09-22 14:42:41,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=37482.666666666664, ans=0.125 2024-09-22 14:42:50,684 INFO [train.py:1198] (3/4) Epoch 3, batch 250, loss[loss=0.314, ctc_loss=0.231, cr_loss=0.415, over 17231.00 frames. ], tot_loss[loss=0.348, ctc_loss=0.2635, cr_loss=0.4223, over 2398774.04 frames. ], batch size: 47, lr: 3.46e-02, grad_scale: 32.0 2024-09-22 14:42:51,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37529.333333333336, ans=0.1 2024-09-22 14:42:58,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=37529.333333333336, ans=0.025 2024-09-22 14:43:11,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=37576.0, ans=0.125 2024-09-22 14:43:21,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=37622.666666666664, ans=0.125 2024-09-22 14:44:05,294 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2024-09-22 14:44:12,445 INFO [train.py:1198] (3/4) Epoch 3, batch 300, loss[loss=0.3748, ctc_loss=0.2855, cr_loss=0.4465, over 17032.00 frames. ], tot_loss[loss=0.3477, ctc_loss=0.2631, cr_loss=0.4229, over 2616880.31 frames. ], batch size: 56, lr: 3.46e-02, grad_scale: 32.0 2024-09-22 14:44:17,424 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 14:44:20,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=37762.666666666664, ans=0.025 2024-09-22 14:44:23,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=37762.666666666664, ans=0.002660289855072464 2024-09-22 14:44:37,527 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.369e+02 1.687e+02 1.987e+02 2.495e+02 5.356e+02, threshold=3.975e+02, percent-clipped=4.0 2024-09-22 14:44:43,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.65 vs. limit=22.5 2024-09-22 14:45:14,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=37949.333333333336, ans=0.125 2024-09-22 14:45:30,619 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=12.0 2024-09-22 14:45:31,399 INFO [train.py:1198] (3/4) Epoch 3, batch 350, loss[loss=0.3399, ctc_loss=0.2602, cr_loss=0.3987, over 17297.00 frames. ], tot_loss[loss=0.3485, ctc_loss=0.2638, cr_loss=0.4233, over 2777277.87 frames. ], batch size: 46, lr: 3.45e-02, grad_scale: 32.0 2024-09-22 14:45:49,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=38042.666666666664, ans=0.125 2024-09-22 14:45:54,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=38042.666666666664, ans=0.0 2024-09-22 14:46:03,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=38089.333333333336, ans=0.05 2024-09-22 14:46:57,025 INFO [train.py:1198] (3/4) Epoch 3, batch 400, loss[loss=0.3614, ctc_loss=0.2692, cr_loss=0.4605, over 15900.00 frames. ], tot_loss[loss=0.3484, ctc_loss=0.2638, cr_loss=0.4231, over 2901725.78 frames. ], batch size: 74, lr: 3.45e-02, grad_scale: 32.0 2024-09-22 14:46:57,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=38229.333333333336, ans=0.125 2024-09-22 14:47:08,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38229.333333333336, ans=0.1 2024-09-22 14:47:16,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=38276.0, ans=0.025 2024-09-22 14:47:25,984 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.328e+02 1.690e+02 1.971e+02 2.820e+02 5.296e+02, threshold=3.942e+02, percent-clipped=6.0 2024-09-22 14:48:19,967 INFO [train.py:1198] (3/4) Epoch 3, batch 450, loss[loss=0.3855, ctc_loss=0.2971, cr_loss=0.4424, over 16629.00 frames. ], tot_loss[loss=0.3471, ctc_loss=0.2626, cr_loss=0.4224, over 2998330.66 frames. ], batch size: 66, lr: 3.44e-02, grad_scale: 32.0 2024-09-22 14:49:08,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=38602.666666666664, ans=0.125 2024-09-22 14:49:11,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=38602.666666666664, ans=0.2 2024-09-22 14:49:28,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-09-22 14:49:29,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=38649.333333333336, ans=0.025 2024-09-22 14:49:41,804 INFO [train.py:1198] (3/4) Epoch 3, batch 500, loss[loss=0.3914, ctc_loss=0.3062, cr_loss=0.4261, over 16053.00 frames. ], tot_loss[loss=0.3464, ctc_loss=0.2621, cr_loss=0.4215, over 3078828.48 frames. ], batch size: 74, lr: 3.43e-02, grad_scale: 32.0 2024-09-22 14:49:48,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=38696.0, ans=0.125 2024-09-22 14:49:59,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=38742.666666666664, ans=0.2 2024-09-22 14:50:07,129 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.349e+02 1.795e+02 2.310e+02 2.743e+02 4.759e+02, threshold=4.620e+02, percent-clipped=4.0 2024-09-22 14:50:33,239 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=12.0 2024-09-22 14:50:53,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=38882.666666666664, ans=0.002416811594202899 2024-09-22 14:51:01,267 INFO [train.py:1198] (3/4) Epoch 3, batch 550, loss[loss=0.3675, ctc_loss=0.2788, cr_loss=0.4439, over 17019.00 frames. ], tot_loss[loss=0.3457, ctc_loss=0.2617, cr_loss=0.4203, over 3144300.77 frames. ], batch size: 51, lr: 3.43e-02, grad_scale: 32.0 2024-09-22 14:51:08,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=38929.333333333336, ans=0.125 2024-09-22 14:51:13,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=38929.333333333336, ans=0.125 2024-09-22 14:51:28,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=38976.0, ans=0.125 2024-09-22 14:51:41,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=39022.666666666664, ans=0.125 2024-09-22 14:51:43,542 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.94 vs. limit=15.0 2024-09-22 14:51:59,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=39069.333333333336, ans=0.0 2024-09-22 14:52:02,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=39069.333333333336, ans=0.0023762318840579704 2024-09-22 14:52:17,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=39116.0, ans=0.1 2024-09-22 14:52:21,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=39116.0, ans=0.125 2024-09-22 14:52:28,851 INFO [train.py:1198] (3/4) Epoch 3, batch 600, loss[loss=0.2736, ctc_loss=0.2034, cr_loss=0.3508, over 16685.00 frames. ], tot_loss[loss=0.3452, ctc_loss=0.2611, cr_loss=0.4209, over 3194557.07 frames. ], batch size: 37, lr: 3.42e-02, grad_scale: 32.0 2024-09-22 14:52:45,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=39209.333333333336, ans=0.05 2024-09-22 14:52:54,492 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.307e+02 1.738e+02 2.025e+02 2.578e+02 4.577e+02, threshold=4.049e+02, percent-clipped=0.0 2024-09-22 14:53:05,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2024-09-22 14:53:06,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=39256.0, ans=0.035 2024-09-22 14:53:14,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=39256.0, ans=0.07 2024-09-22 14:53:24,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=39302.666666666664, ans=0.125 2024-09-22 14:53:29,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=39302.666666666664, ans=0.0 2024-09-22 14:53:35,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=39349.333333333336, ans=0.0 2024-09-22 14:53:48,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=39349.333333333336, ans=0.125 2024-09-22 14:53:51,257 INFO [train.py:1198] (3/4) Epoch 3, batch 650, loss[loss=0.2954, ctc_loss=0.2181, cr_loss=0.3865, over 17169.00 frames. ], tot_loss[loss=0.3431, ctc_loss=0.2592, cr_loss=0.4194, over 3240472.87 frames. ], batch size: 41, lr: 3.42e-02, grad_scale: 32.0 2024-09-22 14:53:54,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=39396.0, ans=0.0 2024-09-22 14:54:17,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=39442.666666666664, ans=0.2 2024-09-22 14:54:36,539 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2024-09-22 14:54:51,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=39536.0, ans=0.125 2024-09-22 14:55:10,517 INFO [train.py:1198] (3/4) Epoch 3, batch 700, loss[loss=0.4154, ctc_loss=0.33, cr_loss=0.4271, over 11654.00 frames. ], tot_loss[loss=0.3426, ctc_loss=0.2587, cr_loss=0.4197, over 3262541.79 frames. ], batch size: 123, lr: 3.41e-02, grad_scale: 32.0 2024-09-22 14:55:35,724 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.431e+02 1.707e+02 1.931e+02 2.238e+02 4.906e+02, threshold=3.863e+02, percent-clipped=1.0 2024-09-22 14:56:19,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=39816.0, ans=0.125 2024-09-22 14:56:32,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.12 vs. limit=15.0 2024-09-22 14:56:32,782 INFO [train.py:1198] (3/4) Epoch 3, batch 750, loss[loss=0.2871, ctc_loss=0.2148, cr_loss=0.3615, over 16742.00 frames. ], tot_loss[loss=0.3428, ctc_loss=0.2587, cr_loss=0.4203, over 3285181.71 frames. ], batch size: 37, lr: 3.41e-02, grad_scale: 32.0 2024-09-22 14:56:54,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=39909.333333333336, ans=0.025 2024-09-22 14:57:33,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=40002.666666666664, ans=0.125 2024-09-22 14:57:40,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=40049.333333333336, ans=0.125 2024-09-22 14:57:46,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=40049.333333333336, ans=0.125 2024-09-22 14:57:53,619 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.61 vs. limit=5.0 2024-09-22 14:57:57,150 INFO [train.py:1198] (3/4) Epoch 3, batch 800, loss[loss=0.3546, ctc_loss=0.2682, cr_loss=0.4318, over 17308.00 frames. ], tot_loss[loss=0.342, ctc_loss=0.258, cr_loss=0.42, over 3305063.29 frames. ], batch size: 49, lr: 3.40e-02, grad_scale: 32.0 2024-09-22 14:58:02,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=40096.0, ans=0.125 2024-09-22 14:58:15,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=40142.666666666664, ans=0.1 2024-09-22 14:58:19,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=40142.666666666664, ans=0.0 2024-09-22 14:58:25,182 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.263e+02 1.611e+02 1.885e+02 2.376e+02 4.057e+02, threshold=3.771e+02, percent-clipped=1.0 2024-09-22 14:58:49,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.32 vs. limit=15.0 2024-09-22 14:58:51,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.89 vs. limit=22.5 2024-09-22 14:59:06,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=40282.666666666664, ans=0.0 2024-09-22 14:59:18,682 INFO [train.py:1198] (3/4) Epoch 3, batch 850, loss[loss=0.3268, ctc_loss=0.2445, cr_loss=0.4116, over 17171.00 frames. ], tot_loss[loss=0.3439, ctc_loss=0.2595, cr_loss=0.4219, over 3317046.18 frames. ], batch size: 45, lr: 3.39e-02, grad_scale: 32.0 2024-09-22 14:59:35,595 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2024-09-22 14:59:49,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.40 vs. limit=6.0 2024-09-22 15:00:14,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=40469.333333333336, ans=0.0 2024-09-22 15:00:25,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=40516.0, ans=0.125 2024-09-22 15:00:38,410 INFO [train.py:1198] (3/4) Epoch 3, batch 900, loss[loss=0.3599, ctc_loss=0.2726, cr_loss=0.4367, over 17307.00 frames. ], tot_loss[loss=0.3414, ctc_loss=0.2574, cr_loss=0.4198, over 3324651.37 frames. ], batch size: 49, lr: 3.39e-02, grad_scale: 32.0 2024-09-22 15:00:53,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=40562.666666666664, ans=0.125 2024-09-22 15:01:04,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=40609.333333333336, ans=0.0 2024-09-22 15:01:06,321 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.272e+02 1.646e+02 1.943e+02 2.321e+02 3.880e+02, threshold=3.887e+02, percent-clipped=1.0 2024-09-22 15:01:12,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=40656.0, ans=0.125 2024-09-22 15:01:32,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=40702.666666666664, ans=0.125 2024-09-22 15:01:43,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.57 vs. limit=15.0 2024-09-22 15:01:52,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=40749.333333333336, ans=0.125 2024-09-22 15:02:03,457 INFO [train.py:1198] (3/4) Epoch 3, batch 950, loss[loss=0.3736, ctc_loss=0.2933, cr_loss=0.4015, over 17161.00 frames. ], tot_loss[loss=0.34, ctc_loss=0.2564, cr_loss=0.4183, over 3341493.65 frames. ], batch size: 45, lr: 3.38e-02, grad_scale: 32.0 2024-09-22 15:02:04,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.17 vs. limit=15.0 2024-09-22 15:02:16,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=40796.0, ans=10.0 2024-09-22 15:02:27,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40842.666666666664, ans=0.1 2024-09-22 15:02:43,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.85 vs. limit=15.0 2024-09-22 15:03:13,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=40982.666666666664, ans=0.0019602898550724647 2024-09-22 15:03:28,165 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-22 15:03:28,685 INFO [train.py:1198] (3/4) Epoch 3, batch 1000, loss[loss=0.3291, ctc_loss=0.2477, cr_loss=0.4072, over 17056.00 frames. ], tot_loss[loss=0.3408, ctc_loss=0.257, cr_loss=0.4188, over 3345519.98 frames. ], batch size: 46, lr: 3.38e-02, grad_scale: 32.0 2024-09-22 15:03:33,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=41029.333333333336, ans=0.125 2024-09-22 15:03:41,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=41029.333333333336, ans=0.2 2024-09-22 15:03:54,235 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.429e+02 1.784e+02 2.139e+02 2.624e+02 4.654e+02, threshold=4.278e+02, percent-clipped=1.0 2024-09-22 15:04:03,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=41122.666666666664, ans=0.0 2024-09-22 15:04:13,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=41122.666666666664, ans=0.07 2024-09-22 15:04:31,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=41216.0, ans=0.05 2024-09-22 15:04:48,224 INFO [train.py:1198] (3/4) Epoch 3, batch 1050, loss[loss=0.3496, ctc_loss=0.2626, cr_loss=0.4351, over 17020.00 frames. ], tot_loss[loss=0.3405, ctc_loss=0.2568, cr_loss=0.4187, over 3350031.69 frames. ], batch size: 53, lr: 3.37e-02, grad_scale: 32.0 2024-09-22 15:05:09,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=41309.333333333336, ans=0.1 2024-09-22 15:05:12,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=12.0 2024-09-22 15:05:26,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41356.0, ans=0.1 2024-09-22 15:06:02,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.52 vs. limit=10.0 2024-09-22 15:06:10,766 INFO [train.py:1198] (3/4) Epoch 3, batch 1100, loss[loss=0.3772, ctc_loss=0.283, cr_loss=0.4711, over 16930.00 frames. ], tot_loss[loss=0.3401, ctc_loss=0.2563, cr_loss=0.4189, over 3358173.19 frames. ], batch size: 58, lr: 3.37e-02, grad_scale: 32.0 2024-09-22 15:06:35,964 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.301e+02 1.612e+02 1.917e+02 2.437e+02 4.278e+02, threshold=3.834e+02, percent-clipped=2.0 2024-09-22 15:06:45,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=41589.333333333336, ans=0.025 2024-09-22 15:07:05,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=41636.0, ans=0.1 2024-09-22 15:07:35,530 INFO [train.py:1198] (3/4) Epoch 3, batch 1150, loss[loss=0.3096, ctc_loss=0.2335, cr_loss=0.3806, over 17262.00 frames. ], tot_loss[loss=0.3375, ctc_loss=0.2541, cr_loss=0.4171, over 3371292.09 frames. ], batch size: 44, lr: 3.36e-02, grad_scale: 32.0 2024-09-22 15:08:00,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=41776.0, ans=0.125 2024-09-22 15:08:03,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=41776.0, ans=0.0 2024-09-22 15:08:14,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=41822.666666666664, ans=0.001777681159420291 2024-09-22 15:08:20,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=41822.666666666664, ans=0.125 2024-09-22 15:08:58,269 INFO [train.py:1198] (3/4) Epoch 3, batch 1200, loss[loss=0.3701, ctc_loss=0.2774, cr_loss=0.4636, over 17002.00 frames. ], tot_loss[loss=0.3363, ctc_loss=0.2531, cr_loss=0.416, over 3373665.64 frames. ], batch size: 51, lr: 3.36e-02, grad_scale: 32.0 2024-09-22 15:09:21,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.52 vs. limit=12.0 2024-09-22 15:09:23,540 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.305e+02 1.644e+02 1.881e+02 2.304e+02 4.141e+02, threshold=3.762e+02, percent-clipped=3.0 2024-09-22 15:09:33,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=42056.0, ans=0.001726956521739131 2024-09-22 15:09:36,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=42056.0, ans=0.125 2024-09-22 15:09:36,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=42056.0, ans=10.0 2024-09-22 15:09:44,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=42102.666666666664, ans=0.1 2024-09-22 15:09:45,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=15.0 2024-09-22 15:09:46,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=42102.666666666664, ans=10.0 2024-09-22 15:10:11,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=42149.333333333336, ans=0.04949747468305833 2024-09-22 15:10:14,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=42149.333333333336, ans=0.125 2024-09-22 15:10:17,334 INFO [train.py:1198] (3/4) Epoch 3, batch 1250, loss[loss=0.3308, ctc_loss=0.246, cr_loss=0.4239, over 16961.00 frames. ], tot_loss[loss=0.3372, ctc_loss=0.254, cr_loss=0.4161, over 3369345.85 frames. ], batch size: 42, lr: 3.35e-02, grad_scale: 32.0 2024-09-22 15:10:36,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=42242.666666666664, ans=0.0016863768115942031 2024-09-22 15:11:12,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2024-09-22 15:11:41,931 INFO [train.py:1198] (3/4) Epoch 3, batch 1300, loss[loss=0.3596, ctc_loss=0.2708, cr_loss=0.4442, over 17094.00 frames. ], tot_loss[loss=0.3383, ctc_loss=0.2549, cr_loss=0.4172, over 3360177.57 frames. ], batch size: 49, lr: 3.34e-02, grad_scale: 32.0 2024-09-22 15:11:43,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=42429.333333333336, ans=0.0 2024-09-22 15:12:04,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=42476.0, ans=0.125 2024-09-22 15:12:09,822 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.394e+02 1.777e+02 1.981e+02 2.466e+02 4.544e+02, threshold=3.962e+02, percent-clipped=3.0 2024-09-22 15:12:10,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=42476.0, ans=0.02 2024-09-22 15:12:14,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=42522.666666666664, ans=0.015 2024-09-22 15:12:16,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=42522.666666666664, ans=0.09899494936611666 2024-09-22 15:12:19,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=42522.666666666664, ans=0.125 2024-09-22 15:12:19,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=42522.666666666664, ans=0.125 2024-09-22 15:12:45,195 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2024-09-22 15:12:53,317 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.91 vs. limit=10.0 2024-09-22 15:13:06,080 INFO [train.py:1198] (3/4) Epoch 3, batch 1350, loss[loss=0.3395, ctc_loss=0.2559, cr_loss=0.4178, over 17178.00 frames. ], tot_loss[loss=0.3379, ctc_loss=0.2546, cr_loss=0.4163, over 3359968.25 frames. ], batch size: 45, lr: 3.34e-02, grad_scale: 32.0 2024-09-22 15:13:11,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.40 vs. limit=12.0 2024-09-22 15:13:23,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=42709.333333333336, ans=0.2 2024-09-22 15:13:33,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=42709.333333333336, ans=0.125 2024-09-22 15:14:00,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42802.666666666664, ans=0.1 2024-09-22 15:14:10,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2024-09-22 15:14:25,910 INFO [train.py:1198] (3/4) Epoch 3, batch 1400, loss[loss=0.3478, ctc_loss=0.2627, cr_loss=0.4253, over 17083.00 frames. ], tot_loss[loss=0.3386, ctc_loss=0.2552, cr_loss=0.417, over 3353607.18 frames. ], batch size: 43, lr: 3.33e-02, grad_scale: 32.0 2024-09-22 15:14:32,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=42896.0, ans=0.125 2024-09-22 15:14:44,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=42942.666666666664, ans=0.125 2024-09-22 15:14:45,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=42942.666666666664, ans=0.0 2024-09-22 15:14:51,781 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.368e+02 1.674e+02 1.940e+02 2.218e+02 3.917e+02, threshold=3.881e+02, percent-clipped=0.0 2024-09-22 15:15:01,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42989.333333333336, ans=0.1 2024-09-22 15:15:22,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=43036.0, ans=0.125 2024-09-22 15:15:49,146 INFO [train.py:1198] (3/4) Epoch 3, batch 1450, loss[loss=0.2804, ctc_loss=0.2072, cr_loss=0.3659, over 17104.00 frames. ], tot_loss[loss=0.3387, ctc_loss=0.2551, cr_loss=0.4181, over 3358850.47 frames. ], batch size: 43, lr: 3.33e-02, grad_scale: 32.0 2024-09-22 15:15:54,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=43129.333333333336, ans=0.125 2024-09-22 15:16:04,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=43176.0, ans=0.125 2024-09-22 15:16:38,436 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 15:16:46,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=43269.333333333336, ans=0.125 2024-09-22 15:16:57,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=43316.0, ans=0.0 2024-09-22 15:17:13,620 INFO [train.py:1198] (3/4) Epoch 3, batch 1500, loss[loss=0.3496, ctc_loss=0.2604, cr_loss=0.4459, over 16977.00 frames. ], tot_loss[loss=0.3372, ctc_loss=0.2536, cr_loss=0.4179, over 3366428.95 frames. ], batch size: 42, lr: 3.32e-02, grad_scale: 32.0 2024-09-22 15:17:38,773 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.398e+02 1.638e+02 2.002e+02 2.354e+02 3.823e+02, threshold=4.005e+02, percent-clipped=0.0 2024-09-22 15:18:17,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=43549.333333333336, ans=0.125 2024-09-22 15:18:32,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=43549.333333333336, ans=0.125 2024-09-22 15:18:35,387 INFO [train.py:1198] (3/4) Epoch 3, batch 1550, loss[loss=0.3122, ctc_loss=0.2346, cr_loss=0.388, over 17047.00 frames. ], tot_loss[loss=0.337, ctc_loss=0.2535, cr_loss=0.4175, over 3368514.62 frames. ], batch size: 39, lr: 3.32e-02, grad_scale: 32.0 2024-09-22 15:18:38,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=43596.0, ans=0.125 2024-09-22 15:18:43,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=43596.0, ans=0.2 2024-09-22 15:18:51,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=43642.666666666664, ans=0.2 2024-09-22 15:19:09,431 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 15:19:20,381 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 15:19:38,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=43782.666666666664, ans=0.125 2024-09-22 15:19:55,281 INFO [train.py:1198] (3/4) Epoch 3, batch 1600, loss[loss=0.38, ctc_loss=0.2872, cr_loss=0.4644, over 17044.00 frames. ], tot_loss[loss=0.3368, ctc_loss=0.2533, cr_loss=0.4177, over 3369250.19 frames. ], batch size: 52, lr: 3.31e-02, grad_scale: 32.0 2024-09-22 15:20:11,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=43876.0, ans=10.0 2024-09-22 15:20:14,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=43876.0, ans=0.125 2024-09-22 15:20:20,845 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.341e+02 1.634e+02 1.960e+02 2.382e+02 4.201e+02, threshold=3.920e+02, percent-clipped=2.0 2024-09-22 15:20:21,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=43876.0, ans=0.2 2024-09-22 15:20:55,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.81 vs. limit=15.0 2024-09-22 15:21:09,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=44016.0, ans=0.125 2024-09-22 15:21:14,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=44016.0, ans=0.0 2024-09-22 15:21:17,090 INFO [train.py:1198] (3/4) Epoch 3, batch 1650, loss[loss=0.295, ctc_loss=0.2178, cr_loss=0.3861, over 16965.00 frames. ], tot_loss[loss=0.337, ctc_loss=0.2534, cr_loss=0.418, over 3373956.29 frames. ], batch size: 42, lr: 3.31e-02, grad_scale: 32.0 2024-09-22 15:21:27,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=44062.666666666664, ans=0.2 2024-09-22 15:21:32,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=44062.666666666664, ans=0.025 2024-09-22 15:21:43,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=44109.333333333336, ans=0.125 2024-09-22 15:21:52,367 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.48 vs. limit=22.5 2024-09-22 15:21:53,573 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=22.5 2024-09-22 15:22:10,062 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 15:22:16,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2024-09-22 15:22:19,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=44202.666666666664, ans=0.125 2024-09-22 15:22:26,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.72 vs. limit=10.0 2024-09-22 15:22:41,999 INFO [train.py:1198] (3/4) Epoch 3, batch 1700, loss[loss=0.2931, ctc_loss=0.225, cr_loss=0.3404, over 16952.00 frames. ], tot_loss[loss=0.3375, ctc_loss=0.2539, cr_loss=0.4179, over 3365841.86 frames. ], batch size: 42, lr: 3.30e-02, grad_scale: 32.0 2024-09-22 15:22:50,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.53 vs. limit=15.0 2024-09-22 15:23:08,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2024-09-22 15:23:09,725 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.265e+02 1.715e+02 2.359e+02 2.941e+02 4.631e+02, threshold=4.717e+02, percent-clipped=4.0 2024-09-22 15:24:03,725 INFO [train.py:1198] (3/4) Epoch 3, batch 1750, loss[loss=0.3448, ctc_loss=0.2588, cr_loss=0.43, over 17213.00 frames. ], tot_loss[loss=0.3375, ctc_loss=0.2537, cr_loss=0.4188, over 3369633.97 frames. ], batch size: 47, lr: 3.30e-02, grad_scale: 32.0 2024-09-22 15:24:34,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=44622.666666666664, ans=0.125 2024-09-22 15:25:12,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2024-09-22 15:25:16,604 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.37 vs. limit=15.0 2024-09-22 15:25:25,625 INFO [train.py:1198] (3/4) Epoch 3, batch 1800, loss[loss=0.3245, ctc_loss=0.2464, cr_loss=0.3904, over 17092.00 frames. ], tot_loss[loss=0.3377, ctc_loss=0.254, cr_loss=0.4188, over 3361516.70 frames. ], batch size: 43, lr: 3.29e-02, grad_scale: 64.0 2024-09-22 15:25:27,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=44762.666666666664, ans=0.0 2024-09-22 15:25:39,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=44762.666666666664, ans=0.5 2024-09-22 15:25:39,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=44762.666666666664, ans=0.04949747468305833 2024-09-22 15:25:42,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=44809.333333333336, ans=0.125 2024-09-22 15:25:50,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=44809.333333333336, ans=0.0 2024-09-22 15:25:53,003 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.338e+02 1.793e+02 2.251e+02 2.697e+02 4.483e+02, threshold=4.502e+02, percent-clipped=0.0 2024-09-22 15:25:57,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-22 15:26:06,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=44856.0, ans=0.125 2024-09-22 15:26:19,601 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.65 vs. limit=22.5 2024-09-22 15:26:22,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=44902.666666666664, ans=0.0 2024-09-22 15:26:30,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.25 vs. limit=22.5 2024-09-22 15:26:47,447 INFO [train.py:1198] (3/4) Epoch 3, batch 1850, loss[loss=0.323, ctc_loss=0.2435, cr_loss=0.3979, over 17251.00 frames. ], tot_loss[loss=0.3372, ctc_loss=0.2536, cr_loss=0.4182, over 3362537.60 frames. ], batch size: 42, lr: 3.29e-02, grad_scale: 32.0 2024-09-22 15:27:23,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=45089.333333333336, ans=0.125 2024-09-22 15:28:08,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2024-09-22 15:28:12,191 INFO [train.py:1198] (3/4) Epoch 3, batch 1900, loss[loss=0.313, ctc_loss=0.2367, cr_loss=0.3814, over 17140.00 frames. ], tot_loss[loss=0.3358, ctc_loss=0.2523, cr_loss=0.4176, over 3373233.89 frames. ], batch size: 48, lr: 3.28e-02, grad_scale: 32.0 2024-09-22 15:28:21,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=45229.333333333336, ans=0.001037101449275362 2024-09-22 15:28:38,955 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.328e+02 1.740e+02 2.164e+02 2.812e+02 4.193e+02, threshold=4.328e+02, percent-clipped=0.0 2024-09-22 15:28:51,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.85 vs. limit=22.5 2024-09-22 15:29:26,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=12.0 2024-09-22 15:29:32,030 INFO [train.py:1198] (3/4) Epoch 3, batch 1950, loss[loss=0.3138, ctc_loss=0.233, cr_loss=0.4042, over 16714.00 frames. ], tot_loss[loss=0.3358, ctc_loss=0.2523, cr_loss=0.4175, over 3375001.54 frames. ], batch size: 37, lr: 3.27e-02, grad_scale: 32.0 2024-09-22 15:29:43,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=45462.666666666664, ans=0.1 2024-09-22 15:29:54,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=45509.333333333336, ans=0.125 2024-09-22 15:29:59,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=45509.333333333336, ans=0.0 2024-09-22 15:30:46,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=45649.333333333336, ans=0.125 2024-09-22 15:30:49,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=45649.333333333336, ans=0.07 2024-09-22 15:30:54,282 INFO [train.py:1198] (3/4) Epoch 3, batch 2000, loss[loss=0.34, ctc_loss=0.2581, cr_loss=0.4097, over 17300.00 frames. ], tot_loss[loss=0.3372, ctc_loss=0.2534, cr_loss=0.4189, over 3368538.07 frames. ], batch size: 51, lr: 3.27e-02, grad_scale: 32.0 2024-09-22 15:30:56,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=45696.0, ans=0.125 2024-09-22 15:31:19,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=45742.666666666664, ans=0.0 2024-09-22 15:31:23,960 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.430e+02 1.768e+02 1.993e+02 2.472e+02 5.161e+02, threshold=3.986e+02, percent-clipped=2.0 2024-09-22 15:31:37,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=45789.333333333336, ans=0.025 2024-09-22 15:31:46,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=45836.0, ans=0.0009052173913043481 2024-09-22 15:32:19,446 INFO [train.py:1198] (3/4) Epoch 3, batch 2050, loss[loss=0.307, ctc_loss=0.2276, cr_loss=0.397, over 17045.00 frames. ], tot_loss[loss=0.3374, ctc_loss=0.2537, cr_loss=0.4185, over 3364800.15 frames. ], batch size: 39, lr: 3.26e-02, grad_scale: 32.0 2024-09-22 15:32:23,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=12.0 2024-09-22 15:32:38,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=45976.0, ans=0.1 2024-09-22 15:32:49,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-09-22 15:33:01,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=46022.666666666664, ans=0.025 2024-09-22 15:33:33,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=46116.0, ans=0.125 2024-09-22 15:33:41,358 INFO [train.py:1198] (3/4) Epoch 3, batch 2100, loss[loss=0.2907, ctc_loss=0.2133, cr_loss=0.387, over 17200.00 frames. ], tot_loss[loss=0.3373, ctc_loss=0.2536, cr_loss=0.4188, over 3367699.25 frames. ], batch size: 41, lr: 3.26e-02, grad_scale: 32.0 2024-09-22 15:33:55,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=46209.333333333336, ans=0.125 2024-09-22 15:33:58,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2024-09-22 15:34:08,497 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.358e+02 1.827e+02 2.117e+02 2.620e+02 4.403e+02, threshold=4.235e+02, percent-clipped=1.0 2024-09-22 15:34:35,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=46302.666666666664, ans=0.125 2024-09-22 15:34:48,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=46349.333333333336, ans=0.04949747468305833 2024-09-22 15:35:01,056 INFO [train.py:1198] (3/4) Epoch 3, batch 2150, loss[loss=0.3674, ctc_loss=0.2785, cr_loss=0.4443, over 16475.00 frames. ], tot_loss[loss=0.3384, ctc_loss=0.2545, cr_loss=0.4193, over 3358206.33 frames. ], batch size: 66, lr: 3.25e-02, grad_scale: 32.0 2024-09-22 15:35:04,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=46396.0, ans=0.0 2024-09-22 15:35:23,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=46442.666666666664, ans=0.95 2024-09-22 15:35:53,303 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.41 vs. limit=10.0 2024-09-22 15:35:59,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=46536.0, ans=0.125 2024-09-22 15:36:08,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=46582.666666666664, ans=0.125 2024-09-22 15:36:21,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=46582.666666666664, ans=0.0 2024-09-22 15:36:25,585 INFO [train.py:1198] (3/4) Epoch 3, batch 2200, loss[loss=0.3042, ctc_loss=0.2292, cr_loss=0.3751, over 17033.00 frames. ], tot_loss[loss=0.336, ctc_loss=0.2523, cr_loss=0.4184, over 3365129.51 frames. ], batch size: 52, lr: 3.25e-02, grad_scale: 32.0 2024-09-22 15:36:55,288 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.307e+02 1.636e+02 2.012e+02 2.486e+02 4.697e+02, threshold=4.025e+02, percent-clipped=4.0 2024-09-22 15:37:24,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=46769.333333333336, ans=0.0 2024-09-22 15:37:50,691 INFO [train.py:1198] (3/4) Epoch 3, batch 2250, loss[loss=0.3501, ctc_loss=0.2632, cr_loss=0.4341, over 16995.00 frames. ], tot_loss[loss=0.335, ctc_loss=0.2515, cr_loss=0.4175, over 3373851.05 frames. ], batch size: 56, lr: 3.24e-02, grad_scale: 32.0 2024-09-22 15:37:53,040 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.50 vs. limit=15.0 2024-09-22 15:37:55,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2024-09-22 15:38:26,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=46956.0, ans=0.125 2024-09-22 15:38:46,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=47002.666666666664, ans=0.125 2024-09-22 15:38:59,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=47049.333333333336, ans=0.2 2024-09-22 15:38:59,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=47049.333333333336, ans=0.025 2024-09-22 15:39:01,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=12.0 2024-09-22 15:39:10,367 INFO [train.py:1198] (3/4) Epoch 3, batch 2300, loss[loss=0.3388, ctc_loss=0.2501, cr_loss=0.4436, over 17149.00 frames. ], tot_loss[loss=0.3348, ctc_loss=0.2512, cr_loss=0.4183, over 3364908.59 frames. ], batch size: 48, lr: 3.24e-02, grad_scale: 32.0 2024-09-22 15:39:21,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.71 vs. limit=22.5 2024-09-22 15:39:37,771 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.373e+02 1.712e+02 2.236e+02 2.750e+02 4.925e+02, threshold=4.473e+02, percent-clipped=7.0 2024-09-22 15:40:22,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=47282.666666666664, ans=0.0 2024-09-22 15:40:33,163 INFO [train.py:1198] (3/4) Epoch 3, batch 2350, loss[loss=0.3576, ctc_loss=0.2705, cr_loss=0.4355, over 17144.00 frames. ], tot_loss[loss=0.3351, ctc_loss=0.2514, cr_loss=0.4183, over 3361472.00 frames. ], batch size: 48, lr: 3.23e-02, grad_scale: 32.0 2024-09-22 15:40:36,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=47329.333333333336, ans=0.0 2024-09-22 15:40:41,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2024-09-22 15:41:12,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-22 15:41:20,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=47422.666666666664, ans=0.0 2024-09-22 15:41:58,591 INFO [train.py:1198] (3/4) Epoch 3, batch 2400, loss[loss=0.3107, ctc_loss=0.2305, cr_loss=0.401, over 17079.00 frames. ], tot_loss[loss=0.3359, ctc_loss=0.2523, cr_loss=0.4183, over 3352745.66 frames. ], batch size: 46, lr: 3.23e-02, grad_scale: 32.0 2024-09-22 15:42:25,883 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.420e+02 1.787e+02 2.027e+02 2.376e+02 4.296e+02, threshold=4.054e+02, percent-clipped=0.0 2024-09-22 15:42:38,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=47656.0, ans=0.0 2024-09-22 15:42:59,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=47702.666666666664, ans=0.0 2024-09-22 15:43:02,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=47702.666666666664, ans=0.5 2024-09-22 15:43:09,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=47749.333333333336, ans=0.125 2024-09-22 15:43:21,651 INFO [train.py:1198] (3/4) Epoch 3, batch 2450, loss[loss=0.3334, ctc_loss=0.2513, cr_loss=0.4107, over 17015.00 frames. ], tot_loss[loss=0.338, ctc_loss=0.2541, cr_loss=0.4195, over 3339127.95 frames. ], batch size: 44, lr: 3.22e-02, grad_scale: 32.0 2024-09-22 15:43:29,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=47796.0, ans=0.125 2024-09-22 15:43:31,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=47796.0, ans=0.125 2024-09-22 15:43:41,244 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.11 vs. limit=15.0 2024-09-22 15:43:41,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2024-09-22 15:43:42,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.27 vs. limit=10.0 2024-09-22 15:43:44,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=47842.666666666664, ans=0.0 2024-09-22 15:44:23,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=47982.666666666664, ans=0.125 2024-09-22 15:44:41,205 INFO [train.py:1198] (3/4) Epoch 3, batch 2500, loss[loss=0.2947, ctc_loss=0.2222, cr_loss=0.3625, over 17193.00 frames. ], tot_loss[loss=0.3356, ctc_loss=0.2521, cr_loss=0.4177, over 3344411.77 frames. ], batch size: 41, lr: 3.22e-02, grad_scale: 32.0 2024-09-22 15:45:08,320 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.319e+02 1.703e+02 1.872e+02 2.246e+02 3.567e+02, threshold=3.744e+02, percent-clipped=0.0 2024-09-22 15:45:43,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=48169.333333333336, ans=0.125 2024-09-22 15:45:49,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=48216.0, ans=0.00038782608695652095 2024-09-22 15:46:03,955 INFO [train.py:1198] (3/4) Epoch 3, batch 2550, loss[loss=0.3048, ctc_loss=0.2298, cr_loss=0.3747, over 17246.00 frames. ], tot_loss[loss=0.3339, ctc_loss=0.2507, cr_loss=0.4157, over 3347570.16 frames. ], batch size: 44, lr: 3.21e-02, grad_scale: 32.0 2024-09-22 15:46:33,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=48309.333333333336, ans=0.07 2024-09-22 15:46:44,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=48356.0, ans=0.0 2024-09-22 15:46:57,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=48402.666666666664, ans=0.0003472463768115948 2024-09-22 15:47:31,518 INFO [train.py:1198] (3/4) Epoch 3, batch 2600, loss[loss=0.3565, ctc_loss=0.2639, cr_loss=0.4631, over 17053.00 frames. ], tot_loss[loss=0.331, ctc_loss=0.2481, cr_loss=0.4144, over 3358640.36 frames. ], batch size: 52, lr: 3.21e-02, grad_scale: 32.0 2024-09-22 15:47:33,374 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 15:47:35,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.58 vs. limit=10.0 2024-09-22 15:47:36,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=48496.0, ans=0.2 2024-09-22 15:47:44,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=48496.0, ans=0.125 2024-09-22 15:47:58,749 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.408e+02 1.709e+02 1.993e+02 2.394e+02 4.094e+02, threshold=3.987e+02, percent-clipped=1.0 2024-09-22 15:48:06,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=48589.333333333336, ans=0.00030666666666666516 2024-09-22 15:48:13,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=48589.333333333336, ans=0.0 2024-09-22 15:48:18,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=48636.0, ans=0.025 2024-09-22 15:48:35,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=48682.666666666664, ans=0.1 2024-09-22 15:48:51,848 INFO [train.py:1198] (3/4) Epoch 3, batch 2650, loss[loss=0.3483, ctc_loss=0.2638, cr_loss=0.4227, over 17306.00 frames. ], tot_loss[loss=0.3311, ctc_loss=0.248, cr_loss=0.4155, over 3363235.19 frames. ], batch size: 49, lr: 3.20e-02, grad_scale: 32.0 2024-09-22 15:49:19,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=48776.0, ans=0.125 2024-09-22 15:49:22,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=48822.666666666664, ans=0.125 2024-09-22 15:49:47,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2024-09-22 15:49:53,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=48869.333333333336, ans=0.125 2024-09-22 15:49:56,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=48916.0, ans=0.2 2024-09-22 15:49:58,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=48916.0, ans=0.125 2024-09-22 15:49:59,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=48916.0, ans=0.0 2024-09-22 15:50:01,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=48916.0, ans=0.125 2024-09-22 15:50:14,730 INFO [train.py:1198] (3/4) Epoch 3, batch 2700, loss[loss=0.3821, ctc_loss=0.29, cr_loss=0.4606, over 17021.00 frames. ], tot_loss[loss=0.3316, ctc_loss=0.2485, cr_loss=0.4155, over 3358867.98 frames. ], batch size: 51, lr: 3.20e-02, grad_scale: 32.0 2024-09-22 15:50:23,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.58 vs. limit=15.0 2024-09-22 15:50:26,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=48962.666666666664, ans=0.1 2024-09-22 15:50:38,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=49009.333333333336, ans=0.125 2024-09-22 15:50:41,576 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.331e+02 1.782e+02 2.096e+02 2.443e+02 4.661e+02, threshold=4.192e+02, percent-clipped=1.0 2024-09-22 15:50:42,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.67 vs. limit=22.5 2024-09-22 15:50:47,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.73 vs. limit=6.0 2024-09-22 15:50:55,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.82 vs. limit=10.0 2024-09-22 15:51:39,740 INFO [train.py:1198] (3/4) Epoch 3, batch 2750, loss[loss=0.3186, ctc_loss=0.232, cr_loss=0.4332, over 17105.00 frames. ], tot_loss[loss=0.3325, ctc_loss=0.2493, cr_loss=0.4161, over 3351013.46 frames. ], batch size: 49, lr: 3.19e-02, grad_scale: 32.0 2024-09-22 15:51:44,811 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-22 15:51:49,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=49196.0, ans=0.125 2024-09-22 15:51:49,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49196.0, ans=0.1 2024-09-22 15:52:00,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=49242.666666666664, ans=0.125 2024-09-22 15:52:25,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=49289.333333333336, ans=0.0 2024-09-22 15:52:27,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2024-09-22 15:52:28,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=49336.0, ans=0.1 2024-09-22 15:52:32,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.42 vs. limit=22.5 2024-09-22 15:53:00,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=49429.333333333336, ans=0.125 2024-09-22 15:53:01,838 INFO [train.py:1198] (3/4) Epoch 3, batch 2800, loss[loss=0.3584, ctc_loss=0.2716, cr_loss=0.4343, over 16614.00 frames. ], tot_loss[loss=0.334, ctc_loss=0.2503, cr_loss=0.4185, over 3361375.97 frames. ], batch size: 66, lr: 3.19e-02, grad_scale: 32.0 2024-09-22 15:53:15,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=49429.333333333336, ans=0.0 2024-09-22 15:53:28,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=49476.0, ans=0.125 2024-09-22 15:53:29,476 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.403e+02 1.759e+02 2.001e+02 2.340e+02 4.757e+02, threshold=4.003e+02, percent-clipped=1.0 2024-09-22 15:53:50,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=49569.333333333336, ans=0.125 2024-09-22 15:53:56,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=49569.333333333336, ans=0.125 2024-09-22 15:54:09,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=49616.0, ans=0.2 2024-09-22 15:54:19,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=49616.0, ans=8.347826086956521e-05 2024-09-22 15:54:22,391 INFO [train.py:1198] (3/4) Epoch 3, batch 2850, loss[loss=0.3658, ctc_loss=0.2764, cr_loss=0.4472, over 17289.00 frames. ], tot_loss[loss=0.3344, ctc_loss=0.2507, cr_loss=0.4188, over 3356243.66 frames. ], batch size: 46, lr: 3.18e-02, grad_scale: 32.0 2024-09-22 15:54:30,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.67 vs. limit=22.5 2024-09-22 15:54:46,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49709.333333333336, ans=0.1 2024-09-22 15:55:01,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=49756.0, ans=0.125 2024-09-22 15:55:12,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=49802.666666666664, ans=0.2 2024-09-22 15:55:24,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=49802.666666666664, ans=0.125 2024-09-22 15:55:43,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49896.0, ans=0.1 2024-09-22 15:55:45,086 INFO [train.py:1198] (3/4) Epoch 3, batch 2900, loss[loss=0.3115, ctc_loss=0.2337, cr_loss=0.3886, over 16936.00 frames. ], tot_loss[loss=0.3328, ctc_loss=0.2494, cr_loss=0.4168, over 3359511.29 frames. ], batch size: 42, lr: 3.18e-02, grad_scale: 32.0 2024-09-22 15:56:14,387 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.376e+02 1.745e+02 2.067e+02 2.607e+02 4.355e+02, threshold=4.133e+02, percent-clipped=1.0 2024-09-22 15:56:49,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=50036.0, ans=0.0 2024-09-22 15:57:09,738 INFO [train.py:1198] (3/4) Epoch 3, batch 2950, loss[loss=0.3234, ctc_loss=0.2468, cr_loss=0.3833, over 17014.00 frames. ], tot_loss[loss=0.3327, ctc_loss=0.2493, cr_loss=0.4167, over 3363330.38 frames. ], batch size: 51, lr: 3.17e-02, grad_scale: 32.0 2024-09-22 15:57:17,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=50129.333333333336, ans=0.125 2024-09-22 15:57:57,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=50222.666666666664, ans=0.0 2024-09-22 15:58:12,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=50269.333333333336, ans=0.0 2024-09-22 15:58:23,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=50316.0, ans=0.025 2024-09-22 15:58:31,292 INFO [train.py:1198] (3/4) Epoch 3, batch 3000, loss[loss=0.3376, ctc_loss=0.2492, cr_loss=0.4421, over 17144.00 frames. ], tot_loss[loss=0.3329, ctc_loss=0.2496, cr_loss=0.4169, over 3354976.73 frames. ], batch size: 48, lr: 3.17e-02, grad_scale: 32.0 2024-09-22 15:58:31,292 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 15:58:46,488 INFO [train.py:1230] (3/4) Epoch 3, validation: loss=0.08436, ctc_loss=0.08436, cr_loss=7.957e-15, over 944034.00 frames. 2024-09-22 15:58:46,488 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-22 15:59:12,854 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.406e+02 1.636e+02 1.924e+02 2.171e+02 3.615e+02, threshold=3.848e+02, percent-clipped=0.0 2024-09-22 15:59:30,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=50456.0, ans=0.125 2024-09-22 15:59:30,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2024-09-22 16:00:01,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=50549.333333333336, ans=0.0 2024-09-22 16:00:04,590 INFO [train.py:1198] (3/4) Epoch 3, batch 3050, loss[loss=0.4086, ctc_loss=0.3135, cr_loss=0.4751, over 15049.00 frames. ], tot_loss[loss=0.3324, ctc_loss=0.249, cr_loss=0.4166, over 3361909.76 frames. ], batch size: 89, lr: 3.16e-02, grad_scale: 32.0 2024-09-22 16:00:04,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=50596.0, ans=0.0 2024-09-22 16:00:15,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=50596.0, ans=0.0 2024-09-22 16:00:23,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=50642.666666666664, ans=0.125 2024-09-22 16:00:54,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=50736.0, ans=0.125 2024-09-22 16:01:04,450 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2024-09-22 16:01:14,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=50782.666666666664, ans=0.2 2024-09-22 16:01:21,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=50829.333333333336, ans=0.1 2024-09-22 16:01:21,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=50829.333333333336, ans=0.0 2024-09-22 16:01:22,487 INFO [train.py:1198] (3/4) Epoch 3, batch 3100, loss[loss=0.3386, ctc_loss=0.258, cr_loss=0.403, over 16622.00 frames. ], tot_loss[loss=0.332, ctc_loss=0.2486, cr_loss=0.4168, over 3359657.24 frames. ], batch size: 66, lr: 3.16e-02, grad_scale: 32.0 2024-09-22 16:01:24,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=50829.333333333336, ans=0.05 2024-09-22 16:01:30,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=50829.333333333336, ans=0.1 2024-09-22 16:01:33,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=50829.333333333336, ans=0.1 2024-09-22 16:01:36,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=50876.0, ans=0.025 2024-09-22 16:01:42,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=50876.0, ans=0.0 2024-09-22 16:01:46,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.04 vs. limit=10.0 2024-09-22 16:01:49,070 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.310e+02 1.630e+02 1.881e+02 2.229e+02 3.253e+02, threshold=3.762e+02, percent-clipped=0.0 2024-09-22 16:01:51,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.16 vs. limit=5.0 2024-09-22 16:02:07,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=50922.666666666664, ans=0.1 2024-09-22 16:02:34,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=51016.0, ans=0.125 2024-09-22 16:02:43,174 INFO [train.py:1198] (3/4) Epoch 3, batch 3150, loss[loss=0.3335, ctc_loss=0.2489, cr_loss=0.423, over 17007.00 frames. ], tot_loss[loss=0.3311, ctc_loss=0.248, cr_loss=0.4158, over 3359812.76 frames. ], batch size: 51, lr: 3.15e-02, grad_scale: 32.0 2024-09-22 16:02:46,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=51062.666666666664, ans=0.125 2024-09-22 16:02:56,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=51062.666666666664, ans=0.0 2024-09-22 16:03:03,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=51109.333333333336, ans=0.1 2024-09-22 16:03:06,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=51109.333333333336, ans=0.1 2024-09-22 16:03:57,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=51249.333333333336, ans=0.125 2024-09-22 16:04:03,497 INFO [train.py:1198] (3/4) Epoch 3, batch 3200, loss[loss=0.3227, ctc_loss=0.2368, cr_loss=0.4298, over 16955.00 frames. ], tot_loss[loss=0.3307, ctc_loss=0.2474, cr_loss=0.4165, over 3367190.89 frames. ], batch size: 42, lr: 3.15e-02, grad_scale: 32.0 2024-09-22 16:04:24,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=51342.666666666664, ans=0.125 2024-09-22 16:04:25,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=51342.666666666664, ans=0.125 2024-09-22 16:04:28,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=51342.666666666664, ans=0.125 2024-09-22 16:04:30,052 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.332e+02 1.638e+02 1.865e+02 2.186e+02 5.181e+02, threshold=3.729e+02, percent-clipped=1.0 2024-09-22 16:04:35,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=22.5 2024-09-22 16:05:12,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2024-09-22 16:05:23,512 INFO [train.py:1198] (3/4) Epoch 3, batch 3250, loss[loss=0.363, ctc_loss=0.2661, cr_loss=0.4845, over 17144.00 frames. ], tot_loss[loss=0.3305, ctc_loss=0.2472, cr_loss=0.4167, over 3366503.18 frames. ], batch size: 48, lr: 3.14e-02, grad_scale: 32.0 2024-09-22 16:05:36,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=51529.333333333336, ans=0.1 2024-09-22 16:06:39,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=51762.666666666664, ans=0.015 2024-09-22 16:06:43,311 INFO [train.py:1198] (3/4) Epoch 3, batch 3300, loss[loss=0.3048, ctc_loss=0.2232, cr_loss=0.4079, over 17338.00 frames. ], tot_loss[loss=0.3322, ctc_loss=0.2486, cr_loss=0.4182, over 3358399.99 frames. ], batch size: 48, lr: 3.14e-02, grad_scale: 32.0 2024-09-22 16:06:45,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=51762.666666666664, ans=0.0 2024-09-22 16:06:57,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.93 vs. limit=6.0 2024-09-22 16:06:59,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-09-22 16:07:07,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=51809.333333333336, ans=0.09899494936611666 2024-09-22 16:07:09,972 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.328e+02 1.647e+02 1.953e+02 2.348e+02 4.155e+02, threshold=3.905e+02, percent-clipped=1.0 2024-09-22 16:07:13,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=51856.0, ans=0.125 2024-09-22 16:07:46,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.51 vs. limit=15.0 2024-09-22 16:08:01,373 INFO [train.py:1198] (3/4) Epoch 3, batch 3350, loss[loss=0.3363, ctc_loss=0.2511, cr_loss=0.4261, over 17312.00 frames. ], tot_loss[loss=0.3324, ctc_loss=0.2488, cr_loss=0.418, over 3348283.12 frames. ], batch size: 51, lr: 3.13e-02, grad_scale: 32.0 2024-09-22 16:08:03,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=51996.0, ans=0.125 2024-09-22 16:08:06,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=51996.0, ans=0.0 2024-09-22 16:08:10,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=51996.0, ans=0.1 2024-09-22 16:08:15,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=52042.666666666664, ans=0.0 2024-09-22 16:08:21,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=52042.666666666664, ans=0.125 2024-09-22 16:09:19,279 INFO [train.py:1198] (3/4) Epoch 3, batch 3400, loss[loss=0.322, ctc_loss=0.2327, cr_loss=0.4464, over 17238.00 frames. ], tot_loss[loss=0.3315, ctc_loss=0.248, cr_loss=0.4177, over 3352760.49 frames. ], batch size: 44, lr: 3.13e-02, grad_scale: 32.0 2024-09-22 16:09:19,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=52229.333333333336, ans=0.125 2024-09-22 16:09:21,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=52229.333333333336, ans=0.0 2024-09-22 16:09:30,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=52229.333333333336, ans=0.125 2024-09-22 16:09:38,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=52276.0, ans=0.0 2024-09-22 16:09:45,958 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.323e+02 1.677e+02 1.939e+02 2.464e+02 4.534e+02, threshold=3.878e+02, percent-clipped=3.0 2024-09-22 16:09:48,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=52276.0, ans=15.0 2024-09-22 16:10:37,706 INFO [train.py:1198] (3/4) Epoch 3, batch 3450, loss[loss=0.346, ctc_loss=0.2598, cr_loss=0.4309, over 16980.00 frames. ], tot_loss[loss=0.3313, ctc_loss=0.2479, cr_loss=0.4171, over 3348209.18 frames. ], batch size: 53, lr: 3.12e-02, grad_scale: 32.0 2024-09-22 16:10:42,935 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-22 16:11:01,374 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 16:11:21,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52556.0, ans=0.1 2024-09-22 16:11:57,413 INFO [train.py:1198] (3/4) Epoch 3, batch 3500, loss[loss=0.3635, ctc_loss=0.2742, cr_loss=0.4465, over 17070.00 frames. ], tot_loss[loss=0.331, ctc_loss=0.2476, cr_loss=0.4172, over 3353270.20 frames. ], batch size: 52, lr: 3.12e-02, grad_scale: 32.0 2024-09-22 16:12:24,122 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.446e+02 1.808e+02 2.110e+02 2.675e+02 4.151e+02, threshold=4.220e+02, percent-clipped=2.0 2024-09-22 16:12:49,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=52836.0, ans=0.125 2024-09-22 16:13:15,558 INFO [train.py:1198] (3/4) Epoch 3, batch 3550, loss[loss=0.2996, ctc_loss=0.2117, cr_loss=0.4397, over 17168.00 frames. ], tot_loss[loss=0.3302, ctc_loss=0.2469, cr_loss=0.4164, over 3355745.43 frames. ], batch size: 45, lr: 3.11e-02, grad_scale: 32.0 2024-09-22 16:13:49,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=53022.666666666664, ans=0.125 2024-09-22 16:14:09,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=53069.333333333336, ans=0.035 2024-09-22 16:14:10,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=53069.333333333336, ans=0.0 2024-09-22 16:14:20,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=53116.0, ans=0.0 2024-09-22 16:14:26,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=53116.0, ans=0.125 2024-09-22 16:14:29,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=53116.0, ans=0.125 2024-09-22 16:14:35,692 INFO [train.py:1198] (3/4) Epoch 3, batch 3600, loss[loss=0.3267, ctc_loss=0.2433, cr_loss=0.417, over 17213.00 frames. ], tot_loss[loss=0.3315, ctc_loss=0.2481, cr_loss=0.4172, over 3347974.04 frames. ], batch size: 47, lr: 3.11e-02, grad_scale: 32.0 2024-09-22 16:14:48,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=53162.666666666664, ans=0.1 2024-09-22 16:15:04,297 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.276e+02 1.804e+02 2.185e+02 2.634e+02 3.942e+02, threshold=4.371e+02, percent-clipped=0.0 2024-09-22 16:15:09,906 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.73 vs. limit=15.0 2024-09-22 16:15:40,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=53349.333333333336, ans=0.025 2024-09-22 16:15:48,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=53349.333333333336, ans=0.025 2024-09-22 16:15:48,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=53349.333333333336, ans=0.0 2024-09-22 16:15:57,745 INFO [train.py:1198] (3/4) Epoch 3, batch 3650, loss[loss=0.3152, ctc_loss=0.2342, cr_loss=0.4048, over 17030.00 frames. ], tot_loss[loss=0.3311, ctc_loss=0.2476, cr_loss=0.4174, over 3348755.55 frames. ], batch size: 52, lr: 3.10e-02, grad_scale: 32.0 2024-09-22 16:16:10,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=53396.0, ans=0.0 2024-09-22 16:16:46,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=53536.0, ans=0.2 2024-09-22 16:16:48,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=53536.0, ans=0.125 2024-09-22 16:16:53,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-09-22 16:16:57,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=53536.0, ans=0.125 2024-09-22 16:17:00,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=53582.666666666664, ans=0.035 2024-09-22 16:17:00,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=53582.666666666664, ans=0.1 2024-09-22 16:17:16,133 INFO [train.py:1198] (3/4) Epoch 3, batch 3700, loss[loss=0.2727, ctc_loss=0.2082, cr_loss=0.3223, over 17085.00 frames. ], tot_loss[loss=0.3307, ctc_loss=0.2474, cr_loss=0.4167, over 3348904.81 frames. ], batch size: 43, lr: 3.10e-02, grad_scale: 32.0 2024-09-22 16:17:16,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=53629.333333333336, ans=0.125 2024-09-22 16:17:30,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=53676.0, ans=0.04949747468305833 2024-09-22 16:17:30,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=53676.0, ans=0.0 2024-09-22 16:17:40,917 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 16:17:42,218 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.264e+02 1.642e+02 1.878e+02 2.249e+02 4.018e+02, threshold=3.757e+02, percent-clipped=0.0 2024-09-22 16:18:06,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=53769.333333333336, ans=0.0 2024-09-22 16:18:23,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2024-09-22 16:18:26,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2024-09-22 16:18:34,047 INFO [train.py:1198] (3/4) Epoch 3, batch 3750, loss[loss=0.2995, ctc_loss=0.2215, cr_loss=0.3896, over 16976.00 frames. ], tot_loss[loss=0.3298, ctc_loss=0.2466, cr_loss=0.4161, over 3346981.42 frames. ], batch size: 42, lr: 3.10e-02, grad_scale: 32.0 2024-09-22 16:18:48,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=53909.333333333336, ans=0.1 2024-09-22 16:19:05,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=53956.0, ans=0.0 2024-09-22 16:19:44,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=54049.333333333336, ans=0.125 2024-09-22 16:19:44,250 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=12.0 2024-09-22 16:19:47,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=54049.333333333336, ans=0.125 2024-09-22 16:19:51,721 INFO [train.py:1198] (3/4) Epoch 3, batch 3800, loss[loss=0.3184, ctc_loss=0.236, cr_loss=0.4118, over 17325.00 frames. ], tot_loss[loss=0.331, ctc_loss=0.2478, cr_loss=0.4158, over 3323930.63 frames. ], batch size: 51, lr: 3.09e-02, grad_scale: 32.0 2024-09-22 16:20:07,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=54142.666666666664, ans=0.125 2024-09-22 16:20:17,980 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.396e+02 1.642e+02 1.883e+02 2.367e+02 4.025e+02, threshold=3.766e+02, percent-clipped=5.0 2024-09-22 16:20:40,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=54236.0, ans=0.2 2024-09-22 16:20:45,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=54236.0, ans=0.125 2024-09-22 16:21:10,250 INFO [train.py:1198] (3/4) Epoch 3, batch 3850, loss[loss=0.3647, ctc_loss=0.2744, cr_loss=0.452, over 16872.00 frames. ], tot_loss[loss=0.3327, ctc_loss=0.2496, cr_loss=0.4152, over 3286139.58 frames. ], batch size: 58, lr: 3.09e-02, grad_scale: 64.0 2024-09-22 16:21:21,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=54329.333333333336, ans=0.125 2024-09-22 16:21:30,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=54376.0, ans=0.0 2024-09-22 16:21:50,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=54422.666666666664, ans=0.0 2024-09-22 16:22:11,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=54516.0, ans=0.0 2024-09-22 16:23:12,184 INFO [train.py:1198] (3/4) Epoch 4, batch 0, loss[loss=0.3664, ctc_loss=0.2764, cr_loss=0.4503, over 16753.00 frames. ], tot_loss[loss=0.3664, ctc_loss=0.2764, cr_loss=0.4503, over 16753.00 frames. ], batch size: 61, lr: 2.88e-02, grad_scale: 32.0 2024-09-22 16:23:12,185 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 16:23:27,754 INFO [train.py:1230] (3/4) Epoch 4, validation: loss=0.08466, ctc_loss=0.08466, cr_loss=9.003e-15, over 944034.00 frames. 2024-09-22 16:23:27,755 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-22 16:23:58,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=54590.666666666664, ans=0.09899494936611666 2024-09-22 16:24:00,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=15.0 2024-09-22 16:24:03,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=54637.333333333336, ans=0.125 2024-09-22 16:24:06,065 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.466e+02 1.944e+02 2.324e+02 2.751e+02 6.786e+02, threshold=4.649e+02, percent-clipped=3.0 2024-09-22 16:24:08,541 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2024-09-22 16:24:18,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=54684.0, ans=0.125 2024-09-22 16:24:20,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=54684.0, ans=0.125 2024-09-22 16:24:20,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2024-09-22 16:24:33,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=54730.666666666664, ans=0.1 2024-09-22 16:24:50,276 INFO [train.py:1198] (3/4) Epoch 4, batch 50, loss[loss=0.3461, ctc_loss=0.2618, cr_loss=0.4219, over 17109.00 frames. ], tot_loss[loss=0.3307, ctc_loss=0.2468, cr_loss=0.4195, over 752713.19 frames. ], batch size: 49, lr: 2.88e-02, grad_scale: 32.0 2024-09-22 16:24:53,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=54777.333333333336, ans=0.035 2024-09-22 16:25:12,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=54824.0, ans=0.125 2024-09-22 16:25:13,357 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=12.0 2024-09-22 16:25:33,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=54870.666666666664, ans=0.2 2024-09-22 16:25:46,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=54917.333333333336, ans=0.0 2024-09-22 16:25:49,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=54917.333333333336, ans=0.125 2024-09-22 16:25:52,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=54964.0, ans=0.125 2024-09-22 16:25:57,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=54964.0, ans=0.1 2024-09-22 16:26:12,373 INFO [train.py:1198] (3/4) Epoch 4, batch 100, loss[loss=0.3551, ctc_loss=0.2694, cr_loss=0.4285, over 14896.00 frames. ], tot_loss[loss=0.33, ctc_loss=0.2461, cr_loss=0.4196, over 1329907.77 frames. ], batch size: 89, lr: 2.87e-02, grad_scale: 32.0 2024-09-22 16:26:20,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=15.0 2024-09-22 16:26:41,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=55057.333333333336, ans=0.125 2024-09-22 16:26:47,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=55104.0, ans=0.0 2024-09-22 16:26:50,367 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.359e+02 1.670e+02 1.866e+02 2.190e+02 3.249e+02, threshold=3.731e+02, percent-clipped=0.0 2024-09-22 16:27:08,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=55150.666666666664, ans=0.125 2024-09-22 16:27:10,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=55150.666666666664, ans=10.0 2024-09-22 16:27:16,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.92 vs. limit=15.0 2024-09-22 16:27:27,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=55197.333333333336, ans=0.95 2024-09-22 16:27:32,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=55197.333333333336, ans=0.0 2024-09-22 16:27:35,226 INFO [train.py:1198] (3/4) Epoch 4, batch 150, loss[loss=0.344, ctc_loss=0.2552, cr_loss=0.4437, over 17346.00 frames. ], tot_loss[loss=0.3295, ctc_loss=0.2457, cr_loss=0.419, over 1771432.16 frames. ], batch size: 48, lr: 2.87e-02, grad_scale: 32.0 2024-09-22 16:27:46,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=55244.0, ans=0.0 2024-09-22 16:27:51,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=55290.666666666664, ans=0.1 2024-09-22 16:28:04,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=55290.666666666664, ans=0.2 2024-09-22 16:28:10,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=55337.333333333336, ans=0.0 2024-09-22 16:28:30,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=55384.0, ans=0.2 2024-09-22 16:28:33,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=55384.0, ans=0.05 2024-09-22 16:28:44,064 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.65 vs. limit=22.5 2024-09-22 16:28:58,936 INFO [train.py:1198] (3/4) Epoch 4, batch 200, loss[loss=0.3199, ctc_loss=0.2367, cr_loss=0.4157, over 17156.00 frames. ], tot_loss[loss=0.3243, ctc_loss=0.2415, cr_loss=0.414, over 2125793.06 frames. ], batch size: 48, lr: 2.86e-02, grad_scale: 32.0 2024-09-22 16:29:33,619 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.233e+02 1.559e+02 1.682e+02 1.957e+02 2.989e+02, threshold=3.363e+02, percent-clipped=0.0 2024-09-22 16:30:17,537 INFO [train.py:1198] (3/4) Epoch 4, batch 250, loss[loss=0.3454, ctc_loss=0.2601, cr_loss=0.4265, over 17155.00 frames. ], tot_loss[loss=0.3237, ctc_loss=0.2408, cr_loss=0.4145, over 2412032.25 frames. ], batch size: 48, lr: 2.86e-02, grad_scale: 32.0 2024-09-22 16:30:54,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=55804.0, ans=22.5 2024-09-22 16:31:10,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=55850.666666666664, ans=0.0 2024-09-22 16:31:43,126 INFO [train.py:1198] (3/4) Epoch 4, batch 300, loss[loss=0.295, ctc_loss=0.215, cr_loss=0.4, over 17289.00 frames. ], tot_loss[loss=0.3242, ctc_loss=0.2412, cr_loss=0.4152, over 2626211.70 frames. ], batch size: 46, lr: 2.86e-02, grad_scale: 32.0 2024-09-22 16:31:51,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=55944.0, ans=0.1 2024-09-22 16:32:20,153 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.310e+02 1.659e+02 1.864e+02 2.226e+02 3.223e+02, threshold=3.728e+02, percent-clipped=0.0 2024-09-22 16:32:23,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=56037.333333333336, ans=0.2 2024-09-22 16:33:07,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=8.0 2024-09-22 16:33:07,320 INFO [train.py:1198] (3/4) Epoch 4, batch 350, loss[loss=0.3031, ctc_loss=0.225, cr_loss=0.3904, over 17294.00 frames. ], tot_loss[loss=0.321, ctc_loss=0.2384, cr_loss=0.413, over 2795477.44 frames. ], batch size: 49, lr: 2.85e-02, grad_scale: 32.0 2024-09-22 16:33:07,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=56177.333333333336, ans=0.025 2024-09-22 16:33:07,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=56177.333333333336, ans=0.125 2024-09-22 16:33:29,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=56224.0, ans=0.125 2024-09-22 16:34:00,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2024-09-22 16:34:09,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.55 vs. limit=15.0 2024-09-22 16:34:18,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=56364.0, ans=0.125 2024-09-22 16:34:29,677 INFO [train.py:1198] (3/4) Epoch 4, batch 400, loss[loss=0.2964, ctc_loss=0.2197, cr_loss=0.3836, over 17276.00 frames. ], tot_loss[loss=0.3196, ctc_loss=0.2372, cr_loss=0.4119, over 2920144.13 frames. ], batch size: 42, lr: 2.85e-02, grad_scale: 32.0 2024-09-22 16:34:35,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2024-09-22 16:34:36,853 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2024-09-22 16:34:42,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=56410.666666666664, ans=0.125 2024-09-22 16:34:44,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=56457.333333333336, ans=0.125 2024-09-22 16:34:46,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.74 vs. limit=22.5 2024-09-22 16:35:04,965 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.343e+02 1.655e+02 1.851e+02 2.343e+02 4.879e+02, threshold=3.703e+02, percent-clipped=2.0 2024-09-22 16:35:34,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2024-09-22 16:35:52,503 INFO [train.py:1198] (3/4) Epoch 4, batch 450, loss[loss=0.3356, ctc_loss=0.2477, cr_loss=0.4393, over 17008.00 frames. ], tot_loss[loss=0.3194, ctc_loss=0.237, cr_loss=0.412, over 3014488.61 frames. ], batch size: 53, lr: 2.84e-02, grad_scale: 32.0 2024-09-22 16:36:16,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=56690.666666666664, ans=0.2 2024-09-22 16:36:51,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=56784.0, ans=0.0 2024-09-22 16:36:53,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.41 vs. limit=8.0 2024-09-22 16:37:04,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=56830.666666666664, ans=0.0 2024-09-22 16:37:08,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=56830.666666666664, ans=0.125 2024-09-22 16:37:14,901 INFO [train.py:1198] (3/4) Epoch 4, batch 500, loss[loss=0.4226, ctc_loss=0.337, cr_loss=0.4281, over 12046.00 frames. ], tot_loss[loss=0.3198, ctc_loss=0.2374, cr_loss=0.4121, over 3086678.84 frames. ], batch size: 123, lr: 2.84e-02, grad_scale: 32.0 2024-09-22 16:37:26,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=56877.333333333336, ans=0.125 2024-09-22 16:37:41,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=56924.0, ans=0.2 2024-09-22 16:37:45,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=56970.666666666664, ans=0.95 2024-09-22 16:37:53,150 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.289e+02 1.629e+02 1.949e+02 2.165e+02 3.477e+02, threshold=3.897e+02, percent-clipped=0.0 2024-09-22 16:38:01,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=56970.666666666664, ans=0.2 2024-09-22 16:38:06,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=57017.333333333336, ans=0.125 2024-09-22 16:38:39,922 INFO [train.py:1198] (3/4) Epoch 4, batch 550, loss[loss=0.3448, ctc_loss=0.2566, cr_loss=0.4413, over 17027.00 frames. ], tot_loss[loss=0.3201, ctc_loss=0.2374, cr_loss=0.4135, over 3157392.62 frames. ], batch size: 52, lr: 2.83e-02, grad_scale: 32.0 2024-09-22 16:38:53,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=22.5 2024-09-22 16:38:57,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=57157.333333333336, ans=0.0 2024-09-22 16:39:02,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=57157.333333333336, ans=0.1 2024-09-22 16:39:32,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=57250.666666666664, ans=0.0 2024-09-22 16:39:32,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=57250.666666666664, ans=0.125 2024-09-22 16:39:45,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=57297.333333333336, ans=0.1 2024-09-22 16:39:48,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=57297.333333333336, ans=0.025 2024-09-22 16:39:59,407 INFO [train.py:1198] (3/4) Epoch 4, batch 600, loss[loss=0.2636, ctc_loss=0.1925, cr_loss=0.3553, over 16961.00 frames. ], tot_loss[loss=0.321, ctc_loss=0.238, cr_loss=0.4149, over 3210058.43 frames. ], batch size: 42, lr: 2.83e-02, grad_scale: 32.0 2024-09-22 16:40:04,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=57344.0, ans=0.0 2024-09-22 16:40:05,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=57344.0, ans=0.0 2024-09-22 16:40:34,388 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.289e+02 1.589e+02 1.770e+02 2.208e+02 4.389e+02, threshold=3.540e+02, percent-clipped=1.0 2024-09-22 16:40:48,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=57484.0, ans=0.125 2024-09-22 16:40:53,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=57484.0, ans=0.0 2024-09-22 16:41:19,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=57530.666666666664, ans=0.5 2024-09-22 16:41:24,124 INFO [train.py:1198] (3/4) Epoch 4, batch 650, loss[loss=0.3184, ctc_loss=0.2332, cr_loss=0.426, over 17154.00 frames. ], tot_loss[loss=0.3222, ctc_loss=0.2389, cr_loss=0.4163, over 3237337.22 frames. ], batch size: 45, lr: 2.83e-02, grad_scale: 32.0 2024-09-22 16:41:43,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.87 vs. limit=22.5 2024-09-22 16:41:51,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=57624.0, ans=0.05 2024-09-22 16:42:02,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=57670.666666666664, ans=0.2 2024-09-22 16:42:30,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=57764.0, ans=0.1 2024-09-22 16:42:37,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=57764.0, ans=0.125 2024-09-22 16:42:45,632 INFO [train.py:1198] (3/4) Epoch 4, batch 700, loss[loss=0.334, ctc_loss=0.2496, cr_loss=0.4219, over 17228.00 frames. ], tot_loss[loss=0.3228, ctc_loss=0.2394, cr_loss=0.4169, over 3267397.26 frames. ], batch size: 50, lr: 2.82e-02, grad_scale: 32.0 2024-09-22 16:43:23,512 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.245e+02 1.612e+02 1.882e+02 2.294e+02 3.695e+02, threshold=3.764e+02, percent-clipped=3.0 2024-09-22 16:43:46,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=57950.666666666664, ans=0.025 2024-09-22 16:43:59,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=57997.333333333336, ans=0.125 2024-09-22 16:44:02,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=57997.333333333336, ans=0.125 2024-09-22 16:44:08,400 INFO [train.py:1198] (3/4) Epoch 4, batch 750, loss[loss=0.3098, ctc_loss=0.2286, cr_loss=0.4058, over 17292.00 frames. ], tot_loss[loss=0.3228, ctc_loss=0.2394, cr_loss=0.4171, over 3285235.18 frames. ], batch size: 51, lr: 2.82e-02, grad_scale: 32.0 2024-09-22 16:44:16,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=58044.0, ans=0.0 2024-09-22 16:44:18,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=58044.0, ans=0.0 2024-09-22 16:44:46,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=58137.333333333336, ans=0.0 2024-09-22 16:44:47,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=58137.333333333336, ans=0.2 2024-09-22 16:44:57,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=58184.0, ans=0.0 2024-09-22 16:45:09,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2024-09-22 16:45:27,578 INFO [train.py:1198] (3/4) Epoch 4, batch 800, loss[loss=0.3137, ctc_loss=0.2291, cr_loss=0.4232, over 17143.00 frames. ], tot_loss[loss=0.3205, ctc_loss=0.2377, cr_loss=0.4145, over 3308655.70 frames. ], batch size: 48, lr: 2.81e-02, grad_scale: 32.0 2024-09-22 16:46:07,412 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.352e+02 1.727e+02 1.869e+02 2.216e+02 3.268e+02, threshold=3.738e+02, percent-clipped=0.0 2024-09-22 16:46:16,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-09-22 16:46:18,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=58417.333333333336, ans=0.2 2024-09-22 16:46:29,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=58417.333333333336, ans=0.125 2024-09-22 16:46:51,993 INFO [train.py:1198] (3/4) Epoch 4, batch 850, loss[loss=0.3181, ctc_loss=0.2313, cr_loss=0.4342, over 16942.00 frames. ], tot_loss[loss=0.3211, ctc_loss=0.238, cr_loss=0.4155, over 3319637.79 frames. ], batch size: 42, lr: 2.81e-02, grad_scale: 32.0 2024-09-22 16:47:15,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.29 vs. limit=22.5 2024-09-22 16:47:24,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2024-09-22 16:47:57,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.73 vs. limit=22.5 2024-09-22 16:48:16,719 INFO [train.py:1198] (3/4) Epoch 4, batch 900, loss[loss=0.303, ctc_loss=0.2218, cr_loss=0.4062, over 17014.00 frames. ], tot_loss[loss=0.3202, ctc_loss=0.2372, cr_loss=0.415, over 3328235.10 frames. ], batch size: 51, lr: 2.81e-02, grad_scale: 32.0 2024-09-22 16:48:33,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=58790.666666666664, ans=0.125 2024-09-22 16:48:52,233 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.310e+02 1.664e+02 1.989e+02 2.596e+02 4.339e+02, threshold=3.979e+02, percent-clipped=2.0 2024-09-22 16:49:05,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=58884.0, ans=0.125 2024-09-22 16:49:23,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=58930.666666666664, ans=0.1 2024-09-22 16:49:36,963 INFO [train.py:1198] (3/4) Epoch 4, batch 950, loss[loss=0.3244, ctc_loss=0.2394, cr_loss=0.4249, over 17150.00 frames. ], tot_loss[loss=0.3207, ctc_loss=0.2376, cr_loss=0.4156, over 3341792.66 frames. ], batch size: 48, lr: 2.80e-02, grad_scale: 32.0 2024-09-22 16:50:05,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=59024.0, ans=0.125 2024-09-22 16:50:26,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=59117.333333333336, ans=0.5 2024-09-22 16:50:34,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=59117.333333333336, ans=0.125 2024-09-22 16:50:36,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2024-09-22 16:50:46,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=59164.0, ans=0.125 2024-09-22 16:51:01,451 INFO [train.py:1198] (3/4) Epoch 4, batch 1000, loss[loss=0.3463, ctc_loss=0.2613, cr_loss=0.4252, over 16009.00 frames. ], tot_loss[loss=0.3195, ctc_loss=0.2365, cr_loss=0.415, over 3351034.10 frames. ], batch size: 74, lr: 2.80e-02, grad_scale: 32.0 2024-09-22 16:51:19,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=59257.333333333336, ans=0.1 2024-09-22 16:51:36,100 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.266e+02 1.574e+02 1.735e+02 2.105e+02 3.870e+02, threshold=3.470e+02, percent-clipped=0.0 2024-09-22 16:51:38,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=59304.0, ans=0.2 2024-09-22 16:51:46,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=59304.0, ans=0.2 2024-09-22 16:51:46,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=59304.0, ans=0.2 2024-09-22 16:52:20,428 INFO [train.py:1198] (3/4) Epoch 4, batch 1050, loss[loss=0.4004, ctc_loss=0.3174, cr_loss=0.4151, over 12177.00 frames. ], tot_loss[loss=0.3194, ctc_loss=0.2364, cr_loss=0.4152, over 3355400.30 frames. ], batch size: 124, lr: 2.79e-02, grad_scale: 32.0 2024-09-22 16:52:20,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=59444.0, ans=0.2 2024-09-22 16:52:41,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=59490.666666666664, ans=0.0 2024-09-22 16:53:23,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=59584.0, ans=0.125 2024-09-22 16:53:24,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=59584.0, ans=0.0 2024-09-22 16:53:44,833 INFO [train.py:1198] (3/4) Epoch 4, batch 1100, loss[loss=0.4118, ctc_loss=0.3229, cr_loss=0.4442, over 12561.00 frames. ], tot_loss[loss=0.3184, ctc_loss=0.2357, cr_loss=0.4138, over 3355309.63 frames. ], batch size: 123, lr: 2.79e-02, grad_scale: 32.0 2024-09-22 16:54:18,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=59770.666666666664, ans=0.125 2024-09-22 16:54:20,014 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.292e+02 1.576e+02 1.850e+02 2.274e+02 3.544e+02, threshold=3.699e+02, percent-clipped=1.0 2024-09-22 16:54:45,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.41 vs. limit=15.0 2024-09-22 16:55:04,530 INFO [train.py:1198] (3/4) Epoch 4, batch 1150, loss[loss=0.3061, ctc_loss=0.2264, cr_loss=0.3981, over 17009.00 frames. ], tot_loss[loss=0.3165, ctc_loss=0.2344, cr_loss=0.4107, over 3355969.85 frames. ], batch size: 44, lr: 2.78e-02, grad_scale: 32.0 2024-09-22 16:55:12,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59910.666666666664, ans=0.1 2024-09-22 16:56:19,854 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 16:56:28,659 INFO [train.py:1198] (3/4) Epoch 4, batch 1200, loss[loss=0.3108, ctc_loss=0.234, cr_loss=0.384, over 17055.00 frames. ], tot_loss[loss=0.3152, ctc_loss=0.2333, cr_loss=0.4095, over 3351434.67 frames. ], batch size: 46, lr: 2.78e-02, grad_scale: 32.0 2024-09-22 16:56:51,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2024-09-22 16:57:03,613 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.315e+02 1.578e+02 1.757e+02 2.030e+02 3.618e+02, threshold=3.514e+02, percent-clipped=0.0 2024-09-22 16:57:05,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=60237.333333333336, ans=0.0 2024-09-22 16:57:21,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=60284.0, ans=0.0 2024-09-22 16:57:51,022 INFO [train.py:1198] (3/4) Epoch 4, batch 1250, loss[loss=0.3246, ctc_loss=0.242, cr_loss=0.4127, over 17013.00 frames. ], tot_loss[loss=0.3167, ctc_loss=0.2345, cr_loss=0.411, over 3347859.96 frames. ], batch size: 51, lr: 2.78e-02, grad_scale: 32.0 2024-09-22 16:58:14,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=60424.0, ans=0.0 2024-09-22 16:58:36,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=60470.666666666664, ans=0.125 2024-09-22 16:58:54,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2024-09-22 16:59:00,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=60564.0, ans=0.2 2024-09-22 16:59:05,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=60564.0, ans=0.125 2024-09-22 16:59:12,921 INFO [train.py:1198] (3/4) Epoch 4, batch 1300, loss[loss=0.2978, ctc_loss=0.2214, cr_loss=0.3818, over 16794.00 frames. ], tot_loss[loss=0.3155, ctc_loss=0.2335, cr_loss=0.4102, over 3357100.59 frames. ], batch size: 61, lr: 2.77e-02, grad_scale: 32.0 2024-09-22 16:59:24,500 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-22 16:59:48,140 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.293e+02 1.594e+02 1.780e+02 2.076e+02 4.160e+02, threshold=3.560e+02, percent-clipped=3.0 2024-09-22 17:00:32,402 INFO [train.py:1198] (3/4) Epoch 4, batch 1350, loss[loss=0.3557, ctc_loss=0.2702, cr_loss=0.4278, over 16728.00 frames. ], tot_loss[loss=0.3151, ctc_loss=0.2332, cr_loss=0.4093, over 3358785.72 frames. ], batch size: 61, lr: 2.77e-02, grad_scale: 32.0 2024-09-22 17:00:34,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=60844.0, ans=0.5 2024-09-22 17:00:48,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=60844.0, ans=0.125 2024-09-22 17:01:03,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=60890.666666666664, ans=0.125 2024-09-22 17:01:05,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=60890.666666666664, ans=0.2 2024-09-22 17:01:13,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=60937.333333333336, ans=0.0 2024-09-22 17:01:13,541 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=12.0 2024-09-22 17:01:24,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-09-22 17:01:30,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=60984.0, ans=0.0 2024-09-22 17:01:57,167 INFO [train.py:1198] (3/4) Epoch 4, batch 1400, loss[loss=0.411, ctc_loss=0.3252, cr_loss=0.4287, over 11310.00 frames. ], tot_loss[loss=0.3183, ctc_loss=0.2359, cr_loss=0.412, over 3350425.27 frames. ], batch size: 123, lr: 2.76e-02, grad_scale: 32.0 2024-09-22 17:01:59,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=61077.333333333336, ans=0.125 2024-09-22 17:02:24,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=61124.0, ans=0.125 2024-09-22 17:02:34,724 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.284e+02 1.619e+02 1.850e+02 2.266e+02 3.949e+02, threshold=3.701e+02, percent-clipped=2.0 2024-09-22 17:02:58,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2024-09-22 17:03:21,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=22.5 2024-09-22 17:03:21,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2024-09-22 17:03:22,331 INFO [train.py:1198] (3/4) Epoch 4, batch 1450, loss[loss=0.3533, ctc_loss=0.2606, cr_loss=0.4633, over 17344.00 frames. ], tot_loss[loss=0.3192, ctc_loss=0.2366, cr_loss=0.4126, over 3349181.78 frames. ], batch size: 48, lr: 2.76e-02, grad_scale: 32.0 2024-09-22 17:03:31,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.67 vs. limit=22.5 2024-09-22 17:03:32,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-09-22 17:03:38,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=61357.333333333336, ans=0.2 2024-09-22 17:03:47,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=61357.333333333336, ans=0.0 2024-09-22 17:03:48,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2024-09-22 17:03:52,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=61404.0, ans=0.125 2024-09-22 17:04:05,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=61404.0, ans=0.125 2024-09-22 17:04:06,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=61404.0, ans=0.125 2024-09-22 17:04:13,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=61450.666666666664, ans=0.0 2024-09-22 17:04:34,398 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:04:41,898 INFO [train.py:1198] (3/4) Epoch 4, batch 1500, loss[loss=0.3508, ctc_loss=0.2683, cr_loss=0.4128, over 15066.00 frames. ], tot_loss[loss=0.318, ctc_loss=0.2358, cr_loss=0.4112, over 3346821.12 frames. ], batch size: 89, lr: 2.76e-02, grad_scale: 32.0 2024-09-22 17:05:17,065 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.343e+02 1.547e+02 1.744e+02 2.028e+02 3.491e+02, threshold=3.489e+02, percent-clipped=0.0 2024-09-22 17:05:38,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=61684.0, ans=0.2 2024-09-22 17:05:49,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=61730.666666666664, ans=0.2 2024-09-22 17:06:02,114 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-09-22 17:06:06,038 INFO [train.py:1198] (3/4) Epoch 4, batch 1550, loss[loss=0.2874, ctc_loss=0.2127, cr_loss=0.3735, over 17269.00 frames. ], tot_loss[loss=0.3169, ctc_loss=0.2348, cr_loss=0.4109, over 3348946.90 frames. ], batch size: 44, lr: 2.75e-02, grad_scale: 32.0 2024-09-22 17:06:25,478 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:06:30,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=61824.0, ans=0.125 2024-09-22 17:06:34,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=61824.0, ans=0.125 2024-09-22 17:06:44,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-09-22 17:07:27,835 INFO [train.py:1198] (3/4) Epoch 4, batch 1600, loss[loss=0.3835, ctc_loss=0.2856, cr_loss=0.4896, over 14872.00 frames. ], tot_loss[loss=0.3198, ctc_loss=0.2372, cr_loss=0.413, over 3332240.45 frames. ], batch size: 88, lr: 2.75e-02, grad_scale: 32.0 2024-09-22 17:07:46,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.52 vs. limit=15.0 2024-09-22 17:07:51,541 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.15 vs. limit=15.0 2024-09-22 17:08:05,559 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.293e+02 1.652e+02 1.885e+02 2.249e+02 4.170e+02, threshold=3.770e+02, percent-clipped=2.0 2024-09-22 17:08:08,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.09 vs. limit=15.0 2024-09-22 17:08:17,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=62150.666666666664, ans=0.2 2024-09-22 17:08:37,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=62197.333333333336, ans=0.0 2024-09-22 17:08:46,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.90 vs. limit=6.0 2024-09-22 17:08:50,128 INFO [train.py:1198] (3/4) Epoch 4, batch 1650, loss[loss=0.337, ctc_loss=0.2544, cr_loss=0.4129, over 17026.00 frames. ], tot_loss[loss=0.3191, ctc_loss=0.2367, cr_loss=0.4124, over 3339699.76 frames. ], batch size: 52, lr: 2.75e-02, grad_scale: 32.0 2024-09-22 17:09:05,183 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.72 vs. limit=12.0 2024-09-22 17:09:23,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=62337.333333333336, ans=0.0 2024-09-22 17:09:27,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=62337.333333333336, ans=0.0 2024-09-22 17:09:41,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.84 vs. limit=10.0 2024-09-22 17:09:57,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=62430.666666666664, ans=0.025 2024-09-22 17:10:09,474 INFO [train.py:1198] (3/4) Epoch 4, batch 1700, loss[loss=0.3409, ctc_loss=0.2543, cr_loss=0.4332, over 16766.00 frames. ], tot_loss[loss=0.3195, ctc_loss=0.2368, cr_loss=0.4134, over 3342858.48 frames. ], batch size: 61, lr: 2.74e-02, grad_scale: 32.0 2024-09-22 17:10:23,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=62524.0, ans=0.2 2024-09-22 17:10:29,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=62524.0, ans=0.0 2024-09-22 17:10:40,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=62524.0, ans=0.0 2024-09-22 17:10:49,657 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.211e+02 1.564e+02 1.856e+02 2.208e+02 3.257e+02, threshold=3.711e+02, percent-clipped=0.0 2024-09-22 17:11:07,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=62617.333333333336, ans=0.2 2024-09-22 17:11:15,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=62617.333333333336, ans=0.1 2024-09-22 17:11:21,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=62664.0, ans=0.1 2024-09-22 17:11:34,081 INFO [train.py:1198] (3/4) Epoch 4, batch 1750, loss[loss=0.3239, ctc_loss=0.2368, cr_loss=0.4357, over 16945.00 frames. ], tot_loss[loss=0.3191, ctc_loss=0.2366, cr_loss=0.4128, over 3346945.00 frames. ], batch size: 42, lr: 2.74e-02, grad_scale: 32.0 2024-09-22 17:11:39,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=62710.666666666664, ans=0.0 2024-09-22 17:11:59,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.77 vs. limit=15.0 2024-09-22 17:12:00,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=62757.333333333336, ans=0.1 2024-09-22 17:12:06,293 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:12:09,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=62804.0, ans=0.1 2024-09-22 17:12:17,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=62804.0, ans=0.0 2024-09-22 17:12:22,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=62850.666666666664, ans=0.035 2024-09-22 17:12:27,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=62850.666666666664, ans=0.0 2024-09-22 17:12:35,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=62850.666666666664, ans=0.125 2024-09-22 17:12:51,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=62897.333333333336, ans=0.0 2024-09-22 17:12:58,854 INFO [train.py:1198] (3/4) Epoch 4, batch 1800, loss[loss=0.3364, ctc_loss=0.2536, cr_loss=0.4138, over 17187.00 frames. ], tot_loss[loss=0.3194, ctc_loss=0.2367, cr_loss=0.4136, over 3347809.66 frames. ], batch size: 47, lr: 2.73e-02, grad_scale: 32.0 2024-09-22 17:13:08,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=12.0 2024-09-22 17:13:16,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=62990.666666666664, ans=0.125 2024-09-22 17:13:26,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.49 vs. limit=15.0 2024-09-22 17:13:27,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=62990.666666666664, ans=0.125 2024-09-22 17:13:29,377 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.14 vs. limit=22.5 2024-09-22 17:13:33,467 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.277e+02 1.628e+02 1.997e+02 2.584e+02 3.622e+02, threshold=3.995e+02, percent-clipped=0.0 2024-09-22 17:13:37,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=63037.333333333336, ans=0.125 2024-09-22 17:13:43,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2024-09-22 17:14:17,814 INFO [train.py:1198] (3/4) Epoch 4, batch 1850, loss[loss=0.306, ctc_loss=0.2282, cr_loss=0.3893, over 17340.00 frames. ], tot_loss[loss=0.3187, ctc_loss=0.2359, cr_loss=0.4142, over 3356272.08 frames. ], batch size: 48, lr: 2.73e-02, grad_scale: 32.0 2024-09-22 17:14:24,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=63177.333333333336, ans=0.125 2024-09-22 17:14:46,560 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:14:49,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=63270.666666666664, ans=0.015 2024-09-22 17:14:53,156 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.67 vs. limit=22.5 2024-09-22 17:15:29,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=63364.0, ans=0.0 2024-09-22 17:15:34,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.74 vs. limit=22.5 2024-09-22 17:15:41,412 INFO [train.py:1198] (3/4) Epoch 4, batch 1900, loss[loss=0.2708, ctc_loss=0.1986, cr_loss=0.3611, over 17108.00 frames. ], tot_loss[loss=0.3184, ctc_loss=0.2357, cr_loss=0.4133, over 3347720.97 frames. ], batch size: 40, lr: 2.73e-02, grad_scale: 32.0 2024-09-22 17:15:41,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=63410.666666666664, ans=0.025 2024-09-22 17:16:15,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63504.0, ans=0.1 2024-09-22 17:16:16,411 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.329e+02 1.606e+02 1.832e+02 2.161e+02 3.717e+02, threshold=3.664e+02, percent-clipped=0.0 2024-09-22 17:16:16,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=63504.0, ans=0.1 2024-09-22 17:16:19,928 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:16:20,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63504.0, ans=0.1 2024-09-22 17:16:24,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=63504.0, ans=0.0 2024-09-22 17:16:29,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=63550.666666666664, ans=0.125 2024-09-22 17:16:29,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=63550.666666666664, ans=0.2 2024-09-22 17:16:52,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=63597.333333333336, ans=0.125 2024-09-22 17:17:01,344 INFO [train.py:1198] (3/4) Epoch 4, batch 1950, loss[loss=0.3422, ctc_loss=0.2563, cr_loss=0.4294, over 16828.00 frames. ], tot_loss[loss=0.3174, ctc_loss=0.2348, cr_loss=0.4129, over 3346339.55 frames. ], batch size: 61, lr: 2.72e-02, grad_scale: 32.0 2024-09-22 17:17:01,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63644.0, ans=0.1 2024-09-22 17:17:05,506 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2024-09-22 17:17:23,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=63690.666666666664, ans=0.125 2024-09-22 17:17:23,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=63690.666666666664, ans=0.125 2024-09-22 17:17:28,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63690.666666666664, ans=0.1 2024-09-22 17:17:45,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=63737.333333333336, ans=0.2 2024-09-22 17:17:48,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=63737.333333333336, ans=0.1 2024-09-22 17:17:54,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=63784.0, ans=0.1 2024-09-22 17:18:24,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=63877.333333333336, ans=0.125 2024-09-22 17:18:26,150 INFO [train.py:1198] (3/4) Epoch 4, batch 2000, loss[loss=0.2723, ctc_loss=0.1962, cr_loss=0.3806, over 17043.00 frames. ], tot_loss[loss=0.3153, ctc_loss=0.233, cr_loss=0.4115, over 3352273.78 frames. ], batch size: 39, lr: 2.72e-02, grad_scale: 64.0 2024-09-22 17:18:41,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=63924.0, ans=0.125 2024-09-22 17:19:01,203 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.280e+02 1.569e+02 1.789e+02 2.368e+02 3.802e+02, threshold=3.577e+02, percent-clipped=1.0 2024-09-22 17:19:14,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=64017.333333333336, ans=0.0 2024-09-22 17:19:33,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=64064.0, ans=0.0 2024-09-22 17:19:41,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=64064.0, ans=0.0 2024-09-22 17:19:44,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=64110.666666666664, ans=0.0 2024-09-22 17:19:45,618 INFO [train.py:1198] (3/4) Epoch 4, batch 2050, loss[loss=0.3457, ctc_loss=0.2584, cr_loss=0.4363, over 17028.00 frames. ], tot_loss[loss=0.3165, ctc_loss=0.234, cr_loss=0.4129, over 3354770.22 frames. ], batch size: 53, lr: 2.71e-02, grad_scale: 64.0 2024-09-22 17:19:49,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=64110.666666666664, ans=0.125 2024-09-22 17:20:36,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=64250.666666666664, ans=0.07 2024-09-22 17:20:45,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.07 vs. limit=12.0 2024-09-22 17:21:07,691 INFO [train.py:1198] (3/4) Epoch 4, batch 2100, loss[loss=0.3577, ctc_loss=0.2719, cr_loss=0.429, over 15141.00 frames. ], tot_loss[loss=0.3181, ctc_loss=0.2353, cr_loss=0.4137, over 3344866.08 frames. ], batch size: 89, lr: 2.71e-02, grad_scale: 32.0 2024-09-22 17:21:23,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=64390.666666666664, ans=0.0 2024-09-22 17:21:44,599 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.356e+02 1.633e+02 1.973e+02 2.304e+02 3.408e+02, threshold=3.946e+02, percent-clipped=0.0 2024-09-22 17:21:44,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=64437.333333333336, ans=0.125 2024-09-22 17:21:59,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=64484.0, ans=0.1 2024-09-22 17:22:33,088 INFO [train.py:1198] (3/4) Epoch 4, batch 2150, loss[loss=0.3654, ctc_loss=0.2727, cr_loss=0.4635, over 17219.00 frames. ], tot_loss[loss=0.3189, ctc_loss=0.2361, cr_loss=0.4138, over 3331984.05 frames. ], batch size: 55, lr: 2.71e-02, grad_scale: 32.0 2024-09-22 17:22:34,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=64577.333333333336, ans=0.0 2024-09-22 17:23:05,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=64670.666666666664, ans=0.0 2024-09-22 17:23:16,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=64670.666666666664, ans=0.125 2024-09-22 17:23:25,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=64717.333333333336, ans=0.07 2024-09-22 17:23:40,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=64764.0, ans=0.1 2024-09-22 17:23:45,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=15.0 2024-09-22 17:23:47,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=64764.0, ans=0.0 2024-09-22 17:23:52,468 INFO [train.py:1198] (3/4) Epoch 4, batch 2200, loss[loss=0.3104, ctc_loss=0.2291, cr_loss=0.4065, over 17211.00 frames. ], tot_loss[loss=0.32, ctc_loss=0.2371, cr_loss=0.4147, over 3325311.38 frames. ], batch size: 47, lr: 2.70e-02, grad_scale: 32.0 2024-09-22 17:23:55,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=64810.666666666664, ans=0.0 2024-09-22 17:24:24,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=64904.0, ans=0.125 2024-09-22 17:24:25,096 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2024-09-22 17:24:27,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=64904.0, ans=0.125 2024-09-22 17:24:29,090 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.320e+02 1.691e+02 2.009e+02 2.410e+02 3.639e+02, threshold=4.017e+02, percent-clipped=0.0 2024-09-22 17:24:31,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=64904.0, ans=0.2 2024-09-22 17:24:42,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=22.5 2024-09-22 17:25:11,836 INFO [train.py:1198] (3/4) Epoch 4, batch 2250, loss[loss=0.3329, ctc_loss=0.2463, cr_loss=0.433, over 16924.00 frames. ], tot_loss[loss=0.3196, ctc_loss=0.2368, cr_loss=0.4138, over 3328057.23 frames. ], batch size: 58, lr: 2.70e-02, grad_scale: 32.0 2024-09-22 17:25:12,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=65044.0, ans=0.07 2024-09-22 17:25:22,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65044.0, ans=0.1 2024-09-22 17:25:42,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2024-09-22 17:25:45,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=65137.333333333336, ans=0.09899494936611666 2024-09-22 17:26:00,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=65184.0, ans=0.0 2024-09-22 17:26:18,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=65230.666666666664, ans=0.125 2024-09-22 17:26:33,875 INFO [train.py:1198] (3/4) Epoch 4, batch 2300, loss[loss=0.2748, ctc_loss=0.2023, cr_loss=0.3624, over 17263.00 frames. ], tot_loss[loss=0.3183, ctc_loss=0.2357, cr_loss=0.4132, over 3337676.75 frames. ], batch size: 44, lr: 2.70e-02, grad_scale: 32.0 2024-09-22 17:26:35,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.61 vs. limit=10.0 2024-09-22 17:26:37,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=65277.333333333336, ans=0.0 2024-09-22 17:26:37,982 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.95 vs. limit=15.0 2024-09-22 17:26:41,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=65277.333333333336, ans=0.0 2024-09-22 17:27:13,234 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.343e+02 1.664e+02 1.971e+02 2.314e+02 3.882e+02, threshold=3.942e+02, percent-clipped=0.0 2024-09-22 17:27:17,693 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.51 vs. limit=15.0 2024-09-22 17:27:46,653 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=12.0 2024-09-22 17:27:58,575 INFO [train.py:1198] (3/4) Epoch 4, batch 2350, loss[loss=0.2873, ctc_loss=0.2109, cr_loss=0.382, over 17085.00 frames. ], tot_loss[loss=0.3159, ctc_loss=0.2335, cr_loss=0.4119, over 3348427.52 frames. ], batch size: 40, lr: 2.69e-02, grad_scale: 32.0 2024-09-22 17:27:59,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=65510.666666666664, ans=15.0 2024-09-22 17:28:38,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=65604.0, ans=0.0 2024-09-22 17:28:48,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=65650.66666666667, ans=0.125 2024-09-22 17:28:59,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=65650.66666666667, ans=0.07 2024-09-22 17:29:18,258 INFO [train.py:1198] (3/4) Epoch 4, batch 2400, loss[loss=0.3384, ctc_loss=0.255, cr_loss=0.4165, over 17325.00 frames. ], tot_loss[loss=0.3157, ctc_loss=0.2333, cr_loss=0.4119, over 3352174.71 frames. ], batch size: 52, lr: 2.69e-02, grad_scale: 32.0 2024-09-22 17:29:19,048 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2024-09-22 17:29:22,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2024-09-22 17:29:36,843 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=12.0 2024-09-22 17:29:44,121 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:29:48,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=65837.33333333333, ans=0.125 2024-09-22 17:29:54,855 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.273e+02 1.593e+02 1.794e+02 2.177e+02 3.793e+02, threshold=3.589e+02, percent-clipped=0.0 2024-09-22 17:30:01,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=65837.33333333333, ans=0.0 2024-09-22 17:30:12,342 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2024-09-22 17:30:27,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=65930.66666666667, ans=0.1 2024-09-22 17:30:40,201 INFO [train.py:1198] (3/4) Epoch 4, batch 2450, loss[loss=0.2965, ctc_loss=0.2177, cr_loss=0.3939, over 17166.00 frames. ], tot_loss[loss=0.3155, ctc_loss=0.2331, cr_loss=0.4118, over 3340778.23 frames. ], batch size: 45, lr: 2.68e-02, grad_scale: 32.0 2024-09-22 17:31:12,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=66070.66666666667, ans=0.2 2024-09-22 17:31:15,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=66070.66666666667, ans=0.125 2024-09-22 17:31:39,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=66117.33333333333, ans=0.025 2024-09-22 17:31:44,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=66164.0, ans=15.0 2024-09-22 17:31:53,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=66164.0, ans=0.125 2024-09-22 17:32:02,190 INFO [train.py:1198] (3/4) Epoch 4, batch 2500, loss[loss=0.2993, ctc_loss=0.2264, cr_loss=0.3647, over 17316.00 frames. ], tot_loss[loss=0.3155, ctc_loss=0.2331, cr_loss=0.4123, over 3347009.91 frames. ], batch size: 51, lr: 2.68e-02, grad_scale: 32.0 2024-09-22 17:32:20,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=66257.33333333333, ans=0.0 2024-09-22 17:32:41,646 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.315e+02 1.715e+02 1.996e+02 2.438e+02 3.886e+02, threshold=3.992e+02, percent-clipped=3.0 2024-09-22 17:32:43,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=66304.0, ans=0.025 2024-09-22 17:32:51,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=66350.66666666667, ans=0.125 2024-09-22 17:32:51,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=66350.66666666667, ans=0.2 2024-09-22 17:32:51,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=66350.66666666667, ans=0.2 2024-09-22 17:32:56,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=66350.66666666667, ans=0.0 2024-09-22 17:32:59,328 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:33:24,681 INFO [train.py:1198] (3/4) Epoch 4, batch 2550, loss[loss=0.3239, ctc_loss=0.2431, cr_loss=0.4039, over 16784.00 frames. ], tot_loss[loss=0.3146, ctc_loss=0.2323, cr_loss=0.4115, over 3352075.50 frames. ], batch size: 61, lr: 2.68e-02, grad_scale: 32.0 2024-09-22 17:33:26,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=66444.0, ans=0.0 2024-09-22 17:33:30,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2024-09-22 17:33:37,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=66444.0, ans=0.0 2024-09-22 17:33:40,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=66490.66666666667, ans=0.125 2024-09-22 17:33:57,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=66537.33333333333, ans=0.04949747468305833 2024-09-22 17:34:06,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2024-09-22 17:34:44,463 INFO [train.py:1198] (3/4) Epoch 4, batch 2600, loss[loss=0.3045, ctc_loss=0.2244, cr_loss=0.4007, over 17097.00 frames. ], tot_loss[loss=0.3163, ctc_loss=0.2336, cr_loss=0.4134, over 3349435.06 frames. ], batch size: 49, lr: 2.67e-02, grad_scale: 32.0 2024-09-22 17:35:21,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=66770.66666666667, ans=0.1 2024-09-22 17:35:25,830 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.648e+02 1.831e+02 2.212e+02 5.606e+02, threshold=3.662e+02, percent-clipped=1.0 2024-09-22 17:35:26,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.37 vs. limit=6.0 2024-09-22 17:36:08,653 INFO [train.py:1198] (3/4) Epoch 4, batch 2650, loss[loss=0.2812, ctc_loss=0.2008, cr_loss=0.4019, over 17192.00 frames. ], tot_loss[loss=0.3159, ctc_loss=0.2331, cr_loss=0.4139, over 3344092.54 frames. ], batch size: 47, lr: 2.67e-02, grad_scale: 32.0 2024-09-22 17:36:10,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=66910.66666666667, ans=0.125 2024-09-22 17:36:12,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-09-22 17:37:14,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=67050.66666666667, ans=0.125 2024-09-22 17:37:32,621 INFO [train.py:1198] (3/4) Epoch 4, batch 2700, loss[loss=0.2762, ctc_loss=0.1999, cr_loss=0.3818, over 17272.00 frames. ], tot_loss[loss=0.3148, ctc_loss=0.2321, cr_loss=0.4136, over 3356407.13 frames. ], batch size: 42, lr: 2.67e-02, grad_scale: 32.0 2024-09-22 17:38:09,013 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.345e+02 1.678e+02 1.945e+02 2.373e+02 3.767e+02, threshold=3.890e+02, percent-clipped=1.0 2024-09-22 17:38:31,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=67284.0, ans=0.0 2024-09-22 17:38:43,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-09-22 17:38:52,061 INFO [train.py:1198] (3/4) Epoch 4, batch 2750, loss[loss=0.3264, ctc_loss=0.2422, cr_loss=0.4212, over 17070.00 frames. ], tot_loss[loss=0.3155, ctc_loss=0.2326, cr_loss=0.4145, over 3362496.64 frames. ], batch size: 43, lr: 2.66e-02, grad_scale: 32.0 2024-09-22 17:39:04,034 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-09-22 17:39:14,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=67424.0, ans=0.125 2024-09-22 17:39:18,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=67424.0, ans=0.0 2024-09-22 17:39:52,100 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.89 vs. limit=10.0 2024-09-22 17:40:01,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=67564.0, ans=0.0 2024-09-22 17:40:17,123 INFO [train.py:1198] (3/4) Epoch 4, batch 2800, loss[loss=0.3595, ctc_loss=0.2672, cr_loss=0.4613, over 16999.00 frames. ], tot_loss[loss=0.3151, ctc_loss=0.2322, cr_loss=0.4142, over 3361481.10 frames. ], batch size: 56, lr: 2.66e-02, grad_scale: 32.0 2024-09-22 17:40:40,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=22.5 2024-09-22 17:40:45,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=67657.33333333333, ans=0.125 2024-09-22 17:40:53,564 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.183e+02 1.604e+02 1.847e+02 2.270e+02 3.677e+02, threshold=3.693e+02, percent-clipped=0.0 2024-09-22 17:41:38,546 INFO [train.py:1198] (3/4) Epoch 4, batch 2850, loss[loss=0.3, ctc_loss=0.2189, cr_loss=0.4059, over 17348.00 frames. ], tot_loss[loss=0.3147, ctc_loss=0.232, cr_loss=0.4136, over 3372711.59 frames. ], batch size: 48, lr: 2.65e-02, grad_scale: 32.0 2024-09-22 17:41:42,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=67844.0, ans=0.09899494936611666 2024-09-22 17:41:51,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=67844.0, ans=0.125 2024-09-22 17:42:17,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=67937.33333333333, ans=0.125 2024-09-22 17:42:37,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=67984.0, ans=0.0 2024-09-22 17:42:46,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=68030.66666666667, ans=0.2 2024-09-22 17:43:00,659 INFO [train.py:1198] (3/4) Epoch 4, batch 2900, loss[loss=0.312, ctc_loss=0.2259, cr_loss=0.4303, over 17210.00 frames. ], tot_loss[loss=0.3143, ctc_loss=0.2317, cr_loss=0.4129, over 3370336.89 frames. ], batch size: 47, lr: 2.65e-02, grad_scale: 32.0 2024-09-22 17:43:12,240 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:43:23,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68124.0, ans=0.1 2024-09-22 17:43:37,709 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.237e+02 1.627e+02 1.915e+02 2.361e+02 4.224e+02, threshold=3.831e+02, percent-clipped=1.0 2024-09-22 17:44:05,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.71 vs. limit=15.0 2024-09-22 17:44:20,381 INFO [train.py:1198] (3/4) Epoch 4, batch 2950, loss[loss=0.3233, ctc_loss=0.2449, cr_loss=0.3918, over 14980.00 frames. ], tot_loss[loss=0.3126, ctc_loss=0.2304, cr_loss=0.4111, over 3369272.58 frames. ], batch size: 89, lr: 2.65e-02, grad_scale: 32.0 2024-09-22 17:44:27,251 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:44:31,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=68310.66666666667, ans=0.0 2024-09-22 17:44:38,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=68357.33333333333, ans=0.125 2024-09-22 17:44:58,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=68404.0, ans=0.2 2024-09-22 17:45:41,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=68497.33333333333, ans=0.125 2024-09-22 17:45:44,887 INFO [train.py:1198] (3/4) Epoch 4, batch 3000, loss[loss=0.3471, ctc_loss=0.2517, cr_loss=0.4773, over 16896.00 frames. ], tot_loss[loss=0.3118, ctc_loss=0.2297, cr_loss=0.4108, over 3370895.67 frames. ], batch size: 58, lr: 2.64e-02, grad_scale: 32.0 2024-09-22 17:45:44,887 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 17:45:53,780 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.0924, 4.6891, 4.2761, 4.4744], device='cuda:3') 2024-09-22 17:46:00,411 INFO [train.py:1230] (3/4) Epoch 4, validation: loss=0.07263, ctc_loss=0.07263, cr_loss=7.17e-15, over 944034.00 frames. 2024-09-22 17:46:00,411 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-22 17:46:08,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=68544.0, ans=0.0 2024-09-22 17:46:13,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=68544.0, ans=0.0 2024-09-22 17:46:15,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.83 vs. limit=22.5 2024-09-22 17:46:19,982 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2024-09-22 17:46:21,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=68590.66666666667, ans=0.125 2024-09-22 17:46:35,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=68637.33333333333, ans=0.0 2024-09-22 17:46:35,272 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:46:36,350 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.234e+02 1.618e+02 1.808e+02 2.126e+02 4.273e+02, threshold=3.616e+02, percent-clipped=2.0 2024-09-22 17:46:39,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=68637.33333333333, ans=0.125 2024-09-22 17:46:50,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=68684.0, ans=0.1 2024-09-22 17:47:17,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=68777.33333333333, ans=0.125 2024-09-22 17:47:19,131 INFO [train.py:1198] (3/4) Epoch 4, batch 3050, loss[loss=0.313, ctc_loss=0.2313, cr_loss=0.4089, over 17210.00 frames. ], tot_loss[loss=0.3114, ctc_loss=0.2294, cr_loss=0.41, over 3365333.26 frames. ], batch size: 50, lr: 2.64e-02, grad_scale: 32.0 2024-09-22 17:47:26,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2024-09-22 17:47:43,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=68824.0, ans=0.125 2024-09-22 17:47:55,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=68870.66666666667, ans=0.125 2024-09-22 17:47:59,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=68870.66666666667, ans=0.1 2024-09-22 17:48:15,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=68917.33333333333, ans=0.0 2024-09-22 17:48:42,985 INFO [train.py:1198] (3/4) Epoch 4, batch 3100, loss[loss=0.3235, ctc_loss=0.2314, cr_loss=0.4607, over 17337.00 frames. ], tot_loss[loss=0.3117, ctc_loss=0.2296, cr_loss=0.4104, over 3371695.29 frames. ], batch size: 48, lr: 2.64e-02, grad_scale: 32.0 2024-09-22 17:48:47,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.84 vs. limit=22.5 2024-09-22 17:48:51,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=69010.66666666667, ans=0.95 2024-09-22 17:48:52,749 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:49:04,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=69057.33333333333, ans=0.125 2024-09-22 17:49:08,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=69057.33333333333, ans=0.0 2024-09-22 17:49:14,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=69104.0, ans=0.125 2024-09-22 17:49:19,305 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.276e+02 1.562e+02 1.773e+02 2.272e+02 4.016e+02, threshold=3.545e+02, percent-clipped=1.0 2024-09-22 17:49:47,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.62 vs. limit=15.0 2024-09-22 17:49:52,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=69197.33333333333, ans=0.0 2024-09-22 17:50:01,899 INFO [train.py:1198] (3/4) Epoch 4, batch 3150, loss[loss=0.2732, ctc_loss=0.1951, cr_loss=0.3907, over 17032.00 frames. ], tot_loss[loss=0.3112, ctc_loss=0.229, cr_loss=0.4109, over 3366330.19 frames. ], batch size: 39, lr: 2.63e-02, grad_scale: 32.0 2024-09-22 17:50:12,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=69244.0, ans=0.125 2024-09-22 17:50:28,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=69290.66666666667, ans=0.125 2024-09-22 17:50:43,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=69337.33333333333, ans=0.125 2024-09-22 17:50:55,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2024-09-22 17:51:10,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=69430.66666666667, ans=0.0 2024-09-22 17:51:19,755 INFO [train.py:1198] (3/4) Epoch 4, batch 3200, loss[loss=0.2515, ctc_loss=0.1796, cr_loss=0.3591, over 17111.00 frames. ], tot_loss[loss=0.3103, ctc_loss=0.2284, cr_loss=0.4098, over 3360041.80 frames. ], batch size: 40, lr: 2.63e-02, grad_scale: 32.0 2024-09-22 17:51:28,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.63 vs. limit=22.5 2024-09-22 17:51:54,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=69570.66666666667, ans=0.0 2024-09-22 17:51:55,750 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.313e+02 1.584e+02 1.798e+02 2.186e+02 3.575e+02, threshold=3.596e+02, percent-clipped=1.0 2024-09-22 17:52:03,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=69570.66666666667, ans=0.0 2024-09-22 17:52:19,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=69617.33333333333, ans=0.125 2024-09-22 17:52:24,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=69664.0, ans=0.125 2024-09-22 17:52:24,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=69664.0, ans=0.125 2024-09-22 17:52:26,331 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2024-09-22 17:52:26,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-22 17:52:37,933 INFO [train.py:1198] (3/4) Epoch 4, batch 3250, loss[loss=0.306, ctc_loss=0.2257, cr_loss=0.4015, over 17206.00 frames. ], tot_loss[loss=0.31, ctc_loss=0.2282, cr_loss=0.4093, over 3361374.08 frames. ], batch size: 47, lr: 2.63e-02, grad_scale: 32.0 2024-09-22 17:52:46,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=69710.66666666667, ans=0.125 2024-09-22 17:53:28,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=69850.66666666667, ans=0.1 2024-09-22 17:53:48,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=69897.33333333333, ans=0.0 2024-09-22 17:53:51,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2024-09-22 17:53:53,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=69897.33333333333, ans=0.2 2024-09-22 17:53:56,011 INFO [train.py:1198] (3/4) Epoch 4, batch 3300, loss[loss=0.2542, ctc_loss=0.1854, cr_loss=0.3439, over 17289.00 frames. ], tot_loss[loss=0.3098, ctc_loss=0.2279, cr_loss=0.4094, over 3361100.07 frames. ], batch size: 46, lr: 2.62e-02, grad_scale: 32.0 2024-09-22 17:54:11,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=69990.66666666667, ans=0.125 2024-09-22 17:54:15,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=69990.66666666667, ans=0.125 2024-09-22 17:54:32,128 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.316e+02 1.544e+02 1.731e+02 2.121e+02 3.523e+02, threshold=3.462e+02, percent-clipped=0.0 2024-09-22 17:54:35,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2024-09-22 17:55:03,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=70130.66666666667, ans=0.025 2024-09-22 17:55:04,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=70130.66666666667, ans=0.0 2024-09-22 17:55:16,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=70130.66666666667, ans=0.125 2024-09-22 17:55:19,192 INFO [train.py:1198] (3/4) Epoch 4, batch 3350, loss[loss=0.3257, ctc_loss=0.244, cr_loss=0.4085, over 15829.00 frames. ], tot_loss[loss=0.3109, ctc_loss=0.2288, cr_loss=0.4102, over 3367917.00 frames. ], batch size: 74, lr: 2.62e-02, grad_scale: 32.0 2024-09-22 17:55:39,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=70224.0, ans=0.0 2024-09-22 17:55:43,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=70224.0, ans=0.95 2024-09-22 17:56:01,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=70270.66666666667, ans=0.125 2024-09-22 17:56:02,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=70270.66666666667, ans=0.0 2024-09-22 17:56:36,974 INFO [train.py:1198] (3/4) Epoch 4, batch 3400, loss[loss=0.336, ctc_loss=0.2533, cr_loss=0.4131, over 17312.00 frames. ], tot_loss[loss=0.3092, ctc_loss=0.2274, cr_loss=0.4087, over 3375037.00 frames. ], batch size: 51, lr: 2.62e-02, grad_scale: 32.0 2024-09-22 17:56:41,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=70410.66666666667, ans=0.1 2024-09-22 17:56:51,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=70457.33333333333, ans=0.125 2024-09-22 17:57:03,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=70457.33333333333, ans=0.2 2024-09-22 17:57:12,457 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.333e+02 1.518e+02 1.665e+02 2.014e+02 3.333e+02, threshold=3.330e+02, percent-clipped=0.0 2024-09-22 17:57:25,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=70550.66666666667, ans=0.125 2024-09-22 17:57:27,163 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.05 vs. limit=22.5 2024-09-22 17:57:54,210 INFO [train.py:1198] (3/4) Epoch 4, batch 3450, loss[loss=0.3761, ctc_loss=0.287, cr_loss=0.4453, over 16477.00 frames. ], tot_loss[loss=0.3105, ctc_loss=0.2285, cr_loss=0.41, over 3369255.60 frames. ], batch size: 66, lr: 2.61e-02, grad_scale: 32.0 2024-09-22 17:57:57,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=70644.0, ans=0.025 2024-09-22 17:58:09,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=70690.66666666667, ans=0.0 2024-09-22 17:58:40,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=70737.33333333333, ans=0.125 2024-09-22 17:59:05,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=70830.66666666667, ans=0.1 2024-09-22 17:59:09,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=70830.66666666667, ans=0.2 2024-09-22 17:59:15,856 INFO [train.py:1198] (3/4) Epoch 4, batch 3500, loss[loss=0.3002, ctc_loss=0.2175, cr_loss=0.4137, over 17102.00 frames. ], tot_loss[loss=0.3098, ctc_loss=0.2279, cr_loss=0.4096, over 3371588.78 frames. ], batch size: 49, lr: 2.61e-02, grad_scale: 32.0 2024-09-22 17:59:26,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=70877.33333333333, ans=0.04949747468305833 2024-09-22 17:59:31,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=70924.0, ans=0.125 2024-09-22 17:59:53,260 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.280e+02 1.551e+02 1.744e+02 2.057e+02 3.215e+02, threshold=3.488e+02, percent-clipped=0.0 2024-09-22 18:00:10,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=71017.33333333333, ans=0.2 2024-09-22 18:00:12,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=71017.33333333333, ans=0.2 2024-09-22 18:00:18,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=71064.0, ans=0.07 2024-09-22 18:00:33,941 INFO [train.py:1198] (3/4) Epoch 4, batch 3550, loss[loss=0.3415, ctc_loss=0.2572, cr_loss=0.4215, over 17216.00 frames. ], tot_loss[loss=0.3109, ctc_loss=0.2288, cr_loss=0.4105, over 3367422.50 frames. ], batch size: 50, lr: 2.61e-02, grad_scale: 16.0 2024-09-22 18:00:55,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=71157.33333333333, ans=0.0 2024-09-22 18:01:00,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=71157.33333333333, ans=0.025 2024-09-22 18:01:03,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=71204.0, ans=0.2 2024-09-22 18:01:31,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.25 vs. limit=10.0 2024-09-22 18:01:51,388 INFO [train.py:1198] (3/4) Epoch 4, batch 3600, loss[loss=0.2553, ctc_loss=0.187, cr_loss=0.3413, over 17195.00 frames. ], tot_loss[loss=0.3105, ctc_loss=0.2286, cr_loss=0.4095, over 3361394.87 frames. ], batch size: 41, lr: 2.60e-02, grad_scale: 32.0 2024-09-22 18:02:18,117 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.77 vs. limit=22.5 2024-09-22 18:02:28,362 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.219e+02 1.639e+02 2.046e+02 2.808e+02 4.325e+02, threshold=4.093e+02, percent-clipped=8.0 2024-09-22 18:02:36,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=71484.0, ans=0.2 2024-09-22 18:02:50,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=71484.0, ans=0.125 2024-09-22 18:03:03,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=71530.66666666667, ans=0.05 2024-09-22 18:03:09,094 INFO [train.py:1198] (3/4) Epoch 4, batch 3650, loss[loss=0.3274, ctc_loss=0.2389, cr_loss=0.4424, over 17054.00 frames. ], tot_loss[loss=0.3114, ctc_loss=0.2293, cr_loss=0.4104, over 3355935.28 frames. ], batch size: 52, lr: 2.60e-02, grad_scale: 32.0 2024-09-22 18:03:09,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=71577.33333333333, ans=0.0 2024-09-22 18:03:35,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=71624.0, ans=0.125 2024-09-22 18:03:38,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=71670.66666666667, ans=0.05 2024-09-22 18:03:43,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=71670.66666666667, ans=0.125 2024-09-22 18:03:56,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.98 vs. limit=15.0 2024-09-22 18:04:09,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=71717.33333333333, ans=0.2 2024-09-22 18:04:13,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=71764.0, ans=0.07 2024-09-22 18:04:28,662 INFO [train.py:1198] (3/4) Epoch 4, batch 3700, loss[loss=0.3125, ctc_loss=0.2343, cr_loss=0.3912, over 17353.00 frames. ], tot_loss[loss=0.3115, ctc_loss=0.2294, cr_loss=0.4102, over 3356397.30 frames. ], batch size: 48, lr: 2.60e-02, grad_scale: 32.0 2024-09-22 18:04:47,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=71857.33333333333, ans=0.0 2024-09-22 18:04:47,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=71857.33333333333, ans=0.0 2024-09-22 18:05:07,669 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.267e+02 1.579e+02 1.797e+02 2.044e+02 5.255e+02, threshold=3.594e+02, percent-clipped=1.0 2024-09-22 18:05:11,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=71904.0, ans=0.1 2024-09-22 18:05:12,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=71904.0, ans=0.05 2024-09-22 18:05:26,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=71950.66666666667, ans=0.0 2024-09-22 18:05:36,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.63 vs. limit=10.0 2024-09-22 18:05:48,775 INFO [train.py:1198] (3/4) Epoch 4, batch 3750, loss[loss=0.2746, ctc_loss=0.1977, cr_loss=0.3846, over 17201.00 frames. ], tot_loss[loss=0.3111, ctc_loss=0.2291, cr_loss=0.41, over 3357757.19 frames. ], batch size: 41, lr: 2.59e-02, grad_scale: 32.0 2024-09-22 18:06:10,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=72090.66666666667, ans=0.2 2024-09-22 18:07:06,331 INFO [train.py:1198] (3/4) Epoch 4, batch 3800, loss[loss=0.3684, ctc_loss=0.2767, cr_loss=0.4587, over 14848.00 frames. ], tot_loss[loss=0.312, ctc_loss=0.23, cr_loss=0.4097, over 3331660.42 frames. ], batch size: 89, lr: 2.59e-02, grad_scale: 32.0 2024-09-22 18:07:36,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=72370.66666666667, ans=0.025 2024-09-22 18:07:36,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=72370.66666666667, ans=0.2 2024-09-22 18:07:44,153 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.309e+02 1.589e+02 1.725e+02 2.068e+02 4.482e+02, threshold=3.450e+02, percent-clipped=2.0 2024-09-22 18:07:51,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=72370.66666666667, ans=0.125 2024-09-22 18:07:51,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=72370.66666666667, ans=0.1 2024-09-22 18:07:55,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=72417.33333333333, ans=0.125 2024-09-22 18:08:12,123 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2024-09-22 18:08:16,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=72464.0, ans=0.0 2024-09-22 18:08:25,286 INFO [train.py:1198] (3/4) Epoch 4, batch 3850, loss[loss=0.2674, ctc_loss=0.1903, cr_loss=0.3851, over 16756.00 frames. ], tot_loss[loss=0.3144, ctc_loss=0.2324, cr_loss=0.4099, over 3268975.57 frames. ], batch size: 37, lr: 2.59e-02, grad_scale: 32.0 2024-09-22 18:08:42,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=72557.33333333333, ans=0.125 2024-09-22 18:08:44,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=72557.33333333333, ans=0.125 2024-09-22 18:08:48,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=72557.33333333333, ans=0.125 2024-09-22 18:08:53,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=72557.33333333333, ans=0.0 2024-09-22 18:08:58,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.55 vs. limit=15.0 2024-09-22 18:09:10,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=72650.66666666667, ans=0.125 2024-09-22 18:09:15,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=72650.66666666667, ans=0.125 2024-09-22 18:09:21,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=72650.66666666667, ans=0.0 2024-09-22 18:09:24,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=72650.66666666667, ans=0.015 2024-09-22 18:10:26,644 INFO [train.py:1198] (3/4) Epoch 5, batch 0, loss[loss=0.3252, ctc_loss=0.2436, cr_loss=0.4079, over 17090.00 frames. ], tot_loss[loss=0.3252, ctc_loss=0.2436, cr_loss=0.4079, over 17090.00 frames. ], batch size: 43, lr: 2.40e-02, grad_scale: 32.0 2024-09-22 18:10:26,644 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 18:10:42,116 INFO [train.py:1230] (3/4) Epoch 5, validation: loss=0.07551, ctc_loss=0.07551, cr_loss=6.915e-15, over 944034.00 frames. 2024-09-22 18:10:42,117 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-22 18:11:08,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=72772.0, ans=0.125 2024-09-22 18:11:25,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=72818.66666666667, ans=0.2 2024-09-22 18:11:27,097 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.296e+02 1.660e+02 1.848e+02 2.232e+02 4.613e+02, threshold=3.696e+02, percent-clipped=4.0 2024-09-22 18:12:02,273 INFO [train.py:1198] (3/4) Epoch 5, batch 50, loss[loss=0.2955, ctc_loss=0.2173, cr_loss=0.3909, over 17294.00 frames. ], tot_loss[loss=0.3101, ctc_loss=0.2274, cr_loss=0.4139, over 761192.70 frames. ], batch size: 46, lr: 2.40e-02, grad_scale: 32.0 2024-09-22 18:12:31,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=73005.33333333333, ans=10.0 2024-09-22 18:12:37,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=73052.0, ans=0.125 2024-09-22 18:13:01,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=73098.66666666667, ans=0.1 2024-09-22 18:13:01,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=73098.66666666667, ans=22.5 2024-09-22 18:13:23,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=73145.33333333333, ans=0.1 2024-09-22 18:13:27,568 INFO [train.py:1198] (3/4) Epoch 5, batch 100, loss[loss=0.2785, ctc_loss=0.2038, cr_loss=0.3734, over 16243.00 frames. ], tot_loss[loss=0.3067, ctc_loss=0.2249, cr_loss=0.4092, over 1344187.51 frames. ], batch size: 36, lr: 2.40e-02, grad_scale: 32.0 2024-09-22 18:13:33,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=73192.0, ans=15.0 2024-09-22 18:13:48,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=73238.66666666667, ans=0.0 2024-09-22 18:14:06,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=22.5 2024-09-22 18:14:12,042 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.270e+02 1.507e+02 1.769e+02 2.148e+02 4.396e+02, threshold=3.538e+02, percent-clipped=1.0 2024-09-22 18:14:46,988 INFO [train.py:1198] (3/4) Epoch 5, batch 150, loss[loss=0.2559, ctc_loss=0.1819, cr_loss=0.37, over 17250.00 frames. ], tot_loss[loss=0.3033, ctc_loss=0.2218, cr_loss=0.4077, over 1800562.11 frames. ], batch size: 44, lr: 2.40e-02, grad_scale: 32.0 2024-09-22 18:14:55,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=73425.33333333333, ans=0.2 2024-09-22 18:14:58,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=73425.33333333333, ans=0.125 2024-09-22 18:15:30,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=73518.66666666667, ans=0.125 2024-09-22 18:15:31,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=73518.66666666667, ans=0.05 2024-09-22 18:15:36,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2024-09-22 18:15:42,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=73565.33333333333, ans=0.125 2024-09-22 18:15:43,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.99 vs. limit=6.0 2024-09-22 18:16:12,920 INFO [train.py:1198] (3/4) Epoch 5, batch 200, loss[loss=0.3262, ctc_loss=0.2397, cr_loss=0.4327, over 17010.00 frames. ], tot_loss[loss=0.3044, ctc_loss=0.2227, cr_loss=0.4085, over 2139923.16 frames. ], batch size: 51, lr: 2.39e-02, grad_scale: 32.0 2024-09-22 18:16:42,722 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.90 vs. limit=6.0 2024-09-22 18:16:57,602 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.221e+02 1.454e+02 1.719e+02 2.223e+02 3.373e+02, threshold=3.438e+02, percent-clipped=0.0 2024-09-22 18:17:10,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=73798.66666666667, ans=0.0 2024-09-22 18:17:24,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=73845.33333333333, ans=0.0 2024-09-22 18:17:26,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.73 vs. limit=22.5 2024-09-22 18:17:32,287 INFO [train.py:1198] (3/4) Epoch 5, batch 250, loss[loss=0.2692, ctc_loss=0.1934, cr_loss=0.3789, over 17021.00 frames. ], tot_loss[loss=0.3035, ctc_loss=0.2218, cr_loss=0.4083, over 2409874.74 frames. ], batch size: 44, lr: 2.39e-02, grad_scale: 32.0 2024-09-22 18:17:47,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=73938.66666666667, ans=0.0 2024-09-22 18:17:58,336 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.44 vs. limit=22.5 2024-09-22 18:18:04,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=73938.66666666667, ans=0.125 2024-09-22 18:18:21,853 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2024-09-22 18:18:57,278 INFO [train.py:1198] (3/4) Epoch 5, batch 300, loss[loss=0.3336, ctc_loss=0.2473, cr_loss=0.4315, over 17029.00 frames. ], tot_loss[loss=0.3059, ctc_loss=0.224, cr_loss=0.4097, over 2614275.23 frames. ], batch size: 52, lr: 2.39e-02, grad_scale: 32.0 2024-09-22 18:19:07,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=74125.33333333333, ans=10.0 2024-09-22 18:19:15,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=74172.0, ans=0.125 2024-09-22 18:19:34,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=74218.66666666667, ans=0.0 2024-09-22 18:19:41,771 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.262e+02 1.584e+02 1.929e+02 2.265e+02 4.720e+02, threshold=3.859e+02, percent-clipped=2.0 2024-09-22 18:20:19,152 INFO [train.py:1198] (3/4) Epoch 5, batch 350, loss[loss=0.2982, ctc_loss=0.2158, cr_loss=0.412, over 17003.00 frames. ], tot_loss[loss=0.3069, ctc_loss=0.2247, cr_loss=0.4108, over 2772561.86 frames. ], batch size: 44, lr: 2.38e-02, grad_scale: 32.0 2024-09-22 18:20:21,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=74358.66666666667, ans=0.1 2024-09-22 18:20:49,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-09-22 18:21:03,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=74452.0, ans=0.125 2024-09-22 18:21:13,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=22.5 2024-09-22 18:21:21,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=74498.66666666667, ans=0.1 2024-09-22 18:21:25,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=74545.33333333333, ans=0.125 2024-09-22 18:21:34,535 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2024-09-22 18:21:40,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=74592.0, ans=0.125 2024-09-22 18:21:41,491 INFO [train.py:1198] (3/4) Epoch 5, batch 400, loss[loss=0.3542, ctc_loss=0.276, cr_loss=0.3907, over 11794.00 frames. ], tot_loss[loss=0.3046, ctc_loss=0.2231, cr_loss=0.4078, over 2905258.77 frames. ], batch size: 123, lr: 2.38e-02, grad_scale: 32.0 2024-09-22 18:21:46,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=74592.0, ans=0.1 2024-09-22 18:21:50,131 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2024-09-22 18:22:02,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=74638.66666666667, ans=0.5 2024-09-22 18:22:18,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=74685.33333333333, ans=0.0 2024-09-22 18:22:28,037 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.233e+02 1.516e+02 1.701e+02 1.953e+02 3.306e+02, threshold=3.402e+02, percent-clipped=0.0 2024-09-22 18:22:41,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=74732.0, ans=0.2 2024-09-22 18:23:05,461 INFO [train.py:1198] (3/4) Epoch 5, batch 450, loss[loss=0.2998, ctc_loss=0.2196, cr_loss=0.4011, over 17288.00 frames. ], tot_loss[loss=0.3063, ctc_loss=0.2245, cr_loss=0.409, over 3001663.31 frames. ], batch size: 49, lr: 2.38e-02, grad_scale: 32.0 2024-09-22 18:23:05,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=74825.33333333333, ans=0.1 2024-09-22 18:23:09,303 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.22 vs. limit=22.5 2024-09-22 18:23:12,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=74825.33333333333, ans=0.2 2024-09-22 18:23:18,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=74825.33333333333, ans=0.1 2024-09-22 18:23:24,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=74872.0, ans=0.0 2024-09-22 18:23:27,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=74872.0, ans=0.125 2024-09-22 18:24:16,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=75012.0, ans=0.2 2024-09-22 18:24:27,403 INFO [train.py:1198] (3/4) Epoch 5, batch 500, loss[loss=0.3449, ctc_loss=0.2535, cr_loss=0.4572, over 16997.00 frames. ], tot_loss[loss=0.3067, ctc_loss=0.2247, cr_loss=0.4098, over 3081781.41 frames. ], batch size: 53, lr: 2.37e-02, grad_scale: 32.0 2024-09-22 18:24:52,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=75105.33333333333, ans=0.0 2024-09-22 18:25:01,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=22.5 2024-09-22 18:25:13,994 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.212e+02 1.528e+02 1.760e+02 2.096e+02 4.266e+02, threshold=3.519e+02, percent-clipped=4.0 2024-09-22 18:25:31,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=75245.33333333333, ans=0.1 2024-09-22 18:25:46,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.61 vs. limit=10.0 2024-09-22 18:25:51,522 INFO [train.py:1198] (3/4) Epoch 5, batch 550, loss[loss=0.2453, ctc_loss=0.1761, cr_loss=0.3456, over 16293.00 frames. ], tot_loss[loss=0.3061, ctc_loss=0.2242, cr_loss=0.4095, over 3149875.46 frames. ], batch size: 36, lr: 2.37e-02, grad_scale: 32.0 2024-09-22 18:26:17,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=75338.66666666667, ans=0.1 2024-09-22 18:26:20,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=75338.66666666667, ans=0.125 2024-09-22 18:26:46,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=75432.0, ans=0.0 2024-09-22 18:27:05,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2024-09-22 18:27:06,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2024-09-22 18:27:10,589 INFO [train.py:1198] (3/4) Epoch 5, batch 600, loss[loss=0.2795, ctc_loss=0.2003, cr_loss=0.3957, over 17155.00 frames. ], tot_loss[loss=0.3049, ctc_loss=0.2233, cr_loss=0.4083, over 3197810.71 frames. ], batch size: 45, lr: 2.37e-02, grad_scale: 32.0 2024-09-22 18:27:34,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=75572.0, ans=0.0 2024-09-22 18:27:39,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=75572.0, ans=0.0 2024-09-22 18:27:53,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=75618.66666666667, ans=0.1 2024-09-22 18:27:57,743 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.285e+02 1.538e+02 1.800e+02 2.212e+02 3.356e+02, threshold=3.600e+02, percent-clipped=0.0 2024-09-22 18:28:02,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=75665.33333333333, ans=0.09899494936611666 2024-09-22 18:28:23,161 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.05 vs. limit=10.0 2024-09-22 18:28:35,278 INFO [train.py:1198] (3/4) Epoch 5, batch 650, loss[loss=0.3432, ctc_loss=0.2552, cr_loss=0.4402, over 16868.00 frames. ], tot_loss[loss=0.3037, ctc_loss=0.2221, cr_loss=0.4078, over 3240275.36 frames. ], batch size: 58, lr: 2.36e-02, grad_scale: 32.0 2024-09-22 18:28:41,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=22.5 2024-09-22 18:28:46,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=75758.66666666667, ans=0.0 2024-09-22 18:28:54,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=75805.33333333333, ans=0.1 2024-09-22 18:29:14,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=75852.0, ans=0.0 2024-09-22 18:29:15,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=75852.0, ans=0.2 2024-09-22 18:29:54,261 INFO [train.py:1198] (3/4) Epoch 5, batch 700, loss[loss=0.3074, ctc_loss=0.2262, cr_loss=0.406, over 17348.00 frames. ], tot_loss[loss=0.3052, ctc_loss=0.2233, cr_loss=0.4094, over 3265717.15 frames. ], batch size: 48, lr: 2.36e-02, grad_scale: 32.0 2024-09-22 18:30:06,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=75992.0, ans=0.2 2024-09-22 18:30:19,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=76038.66666666667, ans=0.0 2024-09-22 18:30:43,603 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.226e+02 1.521e+02 1.705e+02 2.105e+02 2.881e+02, threshold=3.410e+02, percent-clipped=0.0 2024-09-22 18:31:18,064 INFO [train.py:1198] (3/4) Epoch 5, batch 750, loss[loss=0.2736, ctc_loss=0.1973, cr_loss=0.3815, over 16279.00 frames. ], tot_loss[loss=0.3032, ctc_loss=0.2216, cr_loss=0.4081, over 3289998.37 frames. ], batch size: 36, lr: 2.36e-02, grad_scale: 32.0 2024-09-22 18:31:43,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=76272.0, ans=0.2 2024-09-22 18:31:49,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=76318.66666666667, ans=0.2 2024-09-22 18:32:28,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=12.0 2024-09-22 18:32:37,282 INFO [train.py:1198] (3/4) Epoch 5, batch 800, loss[loss=0.322, ctc_loss=0.2385, cr_loss=0.4176, over 16911.00 frames. ], tot_loss[loss=0.3043, ctc_loss=0.2225, cr_loss=0.409, over 3301226.64 frames. ], batch size: 58, lr: 2.36e-02, grad_scale: 32.0 2024-09-22 18:32:37,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=76458.66666666667, ans=0.125 2024-09-22 18:33:07,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=76505.33333333333, ans=0.1 2024-09-22 18:33:16,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=76552.0, ans=0.125 2024-09-22 18:33:20,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=76552.0, ans=0.125 2024-09-22 18:33:22,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=76552.0, ans=0.0 2024-09-22 18:33:26,741 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.316e+02 1.675e+02 1.893e+02 2.259e+02 3.192e+02, threshold=3.786e+02, percent-clipped=0.0 2024-09-22 18:33:48,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.41 vs. limit=22.5 2024-09-22 18:34:00,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=76692.0, ans=0.125 2024-09-22 18:34:01,758 INFO [train.py:1198] (3/4) Epoch 5, batch 850, loss[loss=0.3274, ctc_loss=0.2364, cr_loss=0.4548, over 17292.00 frames. ], tot_loss[loss=0.304, ctc_loss=0.2224, cr_loss=0.4084, over 3311831.72 frames. ], batch size: 49, lr: 2.35e-02, grad_scale: 32.0 2024-09-22 18:34:32,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=76785.33333333333, ans=0.125 2024-09-22 18:34:48,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=76832.0, ans=0.0 2024-09-22 18:35:23,903 INFO [train.py:1198] (3/4) Epoch 5, batch 900, loss[loss=0.256, ctc_loss=0.1868, cr_loss=0.3456, over 16341.00 frames. ], tot_loss[loss=0.3041, ctc_loss=0.2225, cr_loss=0.4081, over 3318648.53 frames. ], batch size: 36, lr: 2.35e-02, grad_scale: 32.0 2024-09-22 18:35:32,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=76925.33333333333, ans=0.125 2024-09-22 18:35:39,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=76925.33333333333, ans=0.2 2024-09-22 18:35:42,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=76972.0, ans=0.125 2024-09-22 18:36:10,876 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.238e+02 1.424e+02 1.574e+02 1.806e+02 2.908e+02, threshold=3.147e+02, percent-clipped=0.0 2024-09-22 18:36:16,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=77065.33333333333, ans=0.09899494936611666 2024-09-22 18:36:35,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=77112.0, ans=0.2 2024-09-22 18:36:39,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=77112.0, ans=0.1 2024-09-22 18:36:39,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=77112.0, ans=0.125 2024-09-22 18:36:46,129 INFO [train.py:1198] (3/4) Epoch 5, batch 950, loss[loss=0.3285, ctc_loss=0.2416, cr_loss=0.4346, over 16980.00 frames. ], tot_loss[loss=0.3041, ctc_loss=0.2225, cr_loss=0.4084, over 3331816.01 frames. ], batch size: 53, lr: 2.35e-02, grad_scale: 32.0 2024-09-22 18:37:00,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=77205.33333333333, ans=0.125 2024-09-22 18:37:05,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=77205.33333333333, ans=0.0 2024-09-22 18:37:17,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=77252.0, ans=0.125 2024-09-22 18:37:50,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2024-09-22 18:37:57,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=77345.33333333333, ans=0.125 2024-09-22 18:38:06,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=77345.33333333333, ans=0.1 2024-09-22 18:38:10,583 INFO [train.py:1198] (3/4) Epoch 5, batch 1000, loss[loss=0.3316, ctc_loss=0.247, cr_loss=0.4231, over 17254.00 frames. ], tot_loss[loss=0.3023, ctc_loss=0.2212, cr_loss=0.4056, over 3337660.29 frames. ], batch size: 55, lr: 2.34e-02, grad_scale: 32.0 2024-09-22 18:38:20,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=77392.0, ans=0.0 2024-09-22 18:38:26,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=77438.66666666667, ans=0.2 2024-09-22 18:38:41,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2024-09-22 18:38:42,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=77485.33333333333, ans=0.1 2024-09-22 18:38:55,020 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.192e+02 1.650e+02 1.785e+02 2.141e+02 3.125e+02, threshold=3.569e+02, percent-clipped=0.0 2024-09-22 18:39:00,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.98 vs. limit=15.0 2024-09-22 18:39:14,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=77578.66666666667, ans=0.125 2024-09-22 18:39:25,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=77578.66666666667, ans=0.125 2024-09-22 18:39:30,031 INFO [train.py:1198] (3/4) Epoch 5, batch 1050, loss[loss=0.2896, ctc_loss=0.2155, cr_loss=0.3705, over 17016.00 frames. ], tot_loss[loss=0.3032, ctc_loss=0.222, cr_loss=0.4062, over 3336643.69 frames. ], batch size: 44, lr: 2.34e-02, grad_scale: 32.0 2024-09-22 18:39:34,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=77625.33333333333, ans=0.0 2024-09-22 18:39:39,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=77625.33333333333, ans=0.125 2024-09-22 18:39:49,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=77672.0, ans=0.125 2024-09-22 18:40:54,539 INFO [train.py:1198] (3/4) Epoch 5, batch 1100, loss[loss=0.262, ctc_loss=0.1874, cr_loss=0.373, over 17037.00 frames. ], tot_loss[loss=0.3018, ctc_loss=0.2207, cr_loss=0.4055, over 3349322.36 frames. ], batch size: 44, lr: 2.34e-02, grad_scale: 16.0 2024-09-22 18:41:36,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=77952.0, ans=0.95 2024-09-22 18:41:40,428 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.256e+02 1.520e+02 1.761e+02 2.071e+02 3.558e+02, threshold=3.523e+02, percent-clipped=0.0 2024-09-22 18:41:43,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=77998.66666666667, ans=0.95 2024-09-22 18:41:48,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=77998.66666666667, ans=0.0 2024-09-22 18:42:00,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=78045.33333333333, ans=0.125 2024-09-22 18:42:06,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=78045.33333333333, ans=0.05 2024-09-22 18:42:09,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=78045.33333333333, ans=0.125 2024-09-22 18:42:13,888 INFO [train.py:1198] (3/4) Epoch 5, batch 1150, loss[loss=0.299, ctc_loss=0.218, cr_loss=0.4052, over 17312.00 frames. ], tot_loss[loss=0.3009, ctc_loss=0.2197, cr_loss=0.4058, over 3362350.66 frames. ], batch size: 46, lr: 2.34e-02, grad_scale: 16.0 2024-09-22 18:43:02,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.40 vs. limit=10.0 2024-09-22 18:43:03,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.17 vs. limit=10.0 2024-09-22 18:43:14,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=78232.0, ans=0.2 2024-09-22 18:43:37,776 INFO [train.py:1198] (3/4) Epoch 5, batch 1200, loss[loss=0.2957, ctc_loss=0.2135, cr_loss=0.4109, over 17354.00 frames. ], tot_loss[loss=0.3007, ctc_loss=0.2195, cr_loss=0.4058, over 3359198.75 frames. ], batch size: 48, lr: 2.33e-02, grad_scale: 32.0 2024-09-22 18:43:38,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=78325.33333333333, ans=0.125 2024-09-22 18:44:14,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=78418.66666666667, ans=0.125 2024-09-22 18:44:16,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=78418.66666666667, ans=0.09899494936611666 2024-09-22 18:44:20,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.70 vs. limit=22.5 2024-09-22 18:44:24,263 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.229e+02 1.563e+02 1.694e+02 1.943e+02 2.938e+02, threshold=3.387e+02, percent-clipped=0.0 2024-09-22 18:44:34,051 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 18:44:46,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=78512.0, ans=0.0 2024-09-22 18:44:57,614 INFO [train.py:1198] (3/4) Epoch 5, batch 1250, loss[loss=0.2755, ctc_loss=0.1991, cr_loss=0.3819, over 17293.00 frames. ], tot_loss[loss=0.3017, ctc_loss=0.2203, cr_loss=0.4069, over 3354020.76 frames. ], batch size: 46, lr: 2.33e-02, grad_scale: 32.0 2024-09-22 18:45:27,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=78605.33333333333, ans=0.2 2024-09-22 18:45:27,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-09-22 18:45:30,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=78652.0, ans=0.125 2024-09-22 18:45:34,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=78652.0, ans=0.125 2024-09-22 18:45:56,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=78698.66666666667, ans=0.125 2024-09-22 18:46:06,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=78745.33333333333, ans=0.0 2024-09-22 18:46:07,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=78745.33333333333, ans=0.125 2024-09-22 18:46:11,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2024-09-22 18:46:22,175 INFO [train.py:1198] (3/4) Epoch 5, batch 1300, loss[loss=0.2703, ctc_loss=0.1942, cr_loss=0.3806, over 17051.00 frames. ], tot_loss[loss=0.3018, ctc_loss=0.2203, cr_loss=0.4071, over 3361863.38 frames. ], batch size: 39, lr: 2.33e-02, grad_scale: 32.0 2024-09-22 18:46:35,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=78792.0, ans=0.0 2024-09-22 18:46:36,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.01 vs. limit=15.0 2024-09-22 18:46:41,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=78838.66666666667, ans=0.125 2024-09-22 18:46:41,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=78838.66666666667, ans=0.0 2024-09-22 18:47:08,165 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.258e+02 1.495e+02 1.790e+02 2.182e+02 4.439e+02, threshold=3.579e+02, percent-clipped=1.0 2024-09-22 18:47:43,986 INFO [train.py:1198] (3/4) Epoch 5, batch 1350, loss[loss=0.2825, ctc_loss=0.2056, cr_loss=0.3847, over 17039.00 frames. ], tot_loss[loss=0.3012, ctc_loss=0.2199, cr_loss=0.4068, over 3369896.38 frames. ], batch size: 39, lr: 2.32e-02, grad_scale: 32.0 2024-09-22 18:47:54,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=79025.33333333333, ans=0.125 2024-09-22 18:47:56,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=79025.33333333333, ans=0.0 2024-09-22 18:48:07,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=79072.0, ans=0.5 2024-09-22 18:48:08,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=79072.0, ans=0.125 2024-09-22 18:48:18,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=79118.66666666667, ans=0.0 2024-09-22 18:48:24,982 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2024-09-22 18:48:56,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=79212.0, ans=0.0 2024-09-22 18:49:05,832 INFO [train.py:1198] (3/4) Epoch 5, batch 1400, loss[loss=0.3479, ctc_loss=0.2601, cr_loss=0.4388, over 14936.00 frames. ], tot_loss[loss=0.3021, ctc_loss=0.2206, cr_loss=0.4073, over 3361734.56 frames. ], batch size: 89, lr: 2.32e-02, grad_scale: 32.0 2024-09-22 18:49:26,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=79305.33333333333, ans=0.125 2024-09-22 18:49:26,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.58 vs. limit=15.0 2024-09-22 18:49:38,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=79352.0, ans=0.125 2024-09-22 18:49:46,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=79352.0, ans=0.125 2024-09-22 18:49:53,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=79352.0, ans=0.1 2024-09-22 18:49:54,449 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.244e+02 1.617e+02 1.825e+02 2.247e+02 3.972e+02, threshold=3.649e+02, percent-clipped=1.0 2024-09-22 18:50:15,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=79445.33333333333, ans=0.0 2024-09-22 18:50:30,041 INFO [train.py:1198] (3/4) Epoch 5, batch 1450, loss[loss=0.3729, ctc_loss=0.2852, cr_loss=0.4387, over 11448.00 frames. ], tot_loss[loss=0.3021, ctc_loss=0.2205, cr_loss=0.4079, over 3362081.89 frames. ], batch size: 123, lr: 2.32e-02, grad_scale: 32.0 2024-09-22 18:50:31,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=15.0 2024-09-22 18:50:34,361 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2024-09-22 18:50:49,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=79538.66666666667, ans=0.125 2024-09-22 18:51:18,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-09-22 18:51:32,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=79678.66666666667, ans=0.125 2024-09-22 18:51:49,953 INFO [train.py:1198] (3/4) Epoch 5, batch 1500, loss[loss=0.3119, ctc_loss=0.2278, cr_loss=0.4201, over 17061.00 frames. ], tot_loss[loss=0.3015, ctc_loss=0.2201, cr_loss=0.407, over 3363813.96 frames. ], batch size: 46, lr: 2.32e-02, grad_scale: 32.0 2024-09-22 18:52:01,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=79725.33333333333, ans=0.125 2024-09-22 18:52:12,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2024-09-22 18:52:21,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=79772.0, ans=0.05 2024-09-22 18:52:40,769 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.573e+02 1.905e+02 2.453e+02 3.573e+02, threshold=3.810e+02, percent-clipped=0.0 2024-09-22 18:52:45,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=79865.33333333333, ans=0.025 2024-09-22 18:53:14,049 INFO [train.py:1198] (3/4) Epoch 5, batch 1550, loss[loss=0.2845, ctc_loss=0.2039, cr_loss=0.4031, over 17192.00 frames. ], tot_loss[loss=0.3013, ctc_loss=0.2198, cr_loss=0.4073, over 3366247.79 frames. ], batch size: 41, lr: 2.31e-02, grad_scale: 32.0 2024-09-22 18:53:49,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=80052.0, ans=0.0 2024-09-22 18:54:16,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=80145.33333333333, ans=0.125 2024-09-22 18:54:19,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=80145.33333333333, ans=0.1 2024-09-22 18:54:25,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=80145.33333333333, ans=0.125 2024-09-22 18:54:33,901 INFO [train.py:1198] (3/4) Epoch 5, batch 1600, loss[loss=0.2674, ctc_loss=0.1893, cr_loss=0.3906, over 17192.00 frames. ], tot_loss[loss=0.3016, ctc_loss=0.22, cr_loss=0.4081, over 3366700.36 frames. ], batch size: 41, lr: 2.31e-02, grad_scale: 32.0 2024-09-22 18:54:53,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=80238.66666666667, ans=0.1 2024-09-22 18:54:56,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=80238.66666666667, ans=0.0 2024-09-22 18:55:18,128 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2024-09-22 18:55:18,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=15.0 2024-09-22 18:55:22,232 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.274e+02 1.455e+02 1.591e+02 1.855e+02 3.051e+02, threshold=3.183e+02, percent-clipped=0.0 2024-09-22 18:55:31,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=80332.0, ans=0.125 2024-09-22 18:55:58,318 INFO [train.py:1198] (3/4) Epoch 5, batch 1650, loss[loss=0.348, ctc_loss=0.2576, cr_loss=0.4523, over 17027.00 frames. ], tot_loss[loss=0.3029, ctc_loss=0.2212, cr_loss=0.4084, over 3354445.01 frames. ], batch size: 53, lr: 2.31e-02, grad_scale: 32.0 2024-09-22 18:56:11,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=80425.33333333333, ans=0.125 2024-09-22 18:56:47,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=80565.33333333333, ans=0.125 2024-09-22 18:57:01,792 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 18:57:03,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=80612.0, ans=0.0 2024-09-22 18:57:05,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=80612.0, ans=10.0 2024-09-22 18:57:05,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=80612.0, ans=0.125 2024-09-22 18:57:18,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=80658.66666666667, ans=0.0 2024-09-22 18:57:19,846 INFO [train.py:1198] (3/4) Epoch 5, batch 1700, loss[loss=0.3229, ctc_loss=0.2388, cr_loss=0.4208, over 16491.00 frames. ], tot_loss[loss=0.3027, ctc_loss=0.2209, cr_loss=0.4088, over 3362441.61 frames. ], batch size: 66, lr: 2.30e-02, grad_scale: 32.0 2024-09-22 18:57:21,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=80658.66666666667, ans=0.125 2024-09-22 18:57:34,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=80705.33333333333, ans=0.0 2024-09-22 18:57:49,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=80705.33333333333, ans=0.0 2024-09-22 18:58:08,259 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.290e+02 1.556e+02 1.832e+02 2.128e+02 4.427e+02, threshold=3.664e+02, percent-clipped=3.0 2024-09-22 18:58:30,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=80845.33333333333, ans=0.125 2024-09-22 18:58:34,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=80845.33333333333, ans=0.125 2024-09-22 18:58:41,928 INFO [train.py:1198] (3/4) Epoch 5, batch 1750, loss[loss=0.276, ctc_loss=0.2013, cr_loss=0.3737, over 16928.00 frames. ], tot_loss[loss=0.301, ctc_loss=0.2194, cr_loss=0.4078, over 3373889.02 frames. ], batch size: 42, lr: 2.30e-02, grad_scale: 32.0 2024-09-22 18:58:42,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=80892.0, ans=0.0 2024-09-22 18:58:42,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=80892.0, ans=22.5 2024-09-22 18:58:54,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=80892.0, ans=0.2 2024-09-22 18:59:56,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=81078.66666666667, ans=0.125 2024-09-22 19:00:03,800 INFO [train.py:1198] (3/4) Epoch 5, batch 1800, loss[loss=0.2326, ctc_loss=0.1637, cr_loss=0.3448, over 16690.00 frames. ], tot_loss[loss=0.3001, ctc_loss=0.2186, cr_loss=0.4073, over 3372706.27 frames. ], batch size: 37, lr: 2.30e-02, grad_scale: 32.0 2024-09-22 19:00:38,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=81218.66666666667, ans=0.125 2024-09-22 19:00:46,187 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 19:00:52,231 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.234e+02 1.571e+02 1.779e+02 2.119e+02 4.116e+02, threshold=3.559e+02, percent-clipped=1.0 2024-09-22 19:01:00,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=81265.33333333333, ans=0.2 2024-09-22 19:01:06,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=81265.33333333333, ans=0.0 2024-09-22 19:01:25,612 INFO [train.py:1198] (3/4) Epoch 5, batch 1850, loss[loss=0.3004, ctc_loss=0.2213, cr_loss=0.3956, over 16606.00 frames. ], tot_loss[loss=0.3005, ctc_loss=0.219, cr_loss=0.4071, over 3366462.46 frames. ], batch size: 66, lr: 2.30e-02, grad_scale: 32.0 2024-09-22 19:01:27,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81358.66666666667, ans=0.1 2024-09-22 19:01:27,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=81358.66666666667, ans=0.125 2024-09-22 19:01:27,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=81358.66666666667, ans=0.125 2024-09-22 19:01:29,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=81358.66666666667, ans=0.95 2024-09-22 19:02:09,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=12.0 2024-09-22 19:02:24,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=81498.66666666667, ans=0.0 2024-09-22 19:02:50,455 INFO [train.py:1198] (3/4) Epoch 5, batch 1900, loss[loss=0.2924, ctc_loss=0.2143, cr_loss=0.3904, over 17295.00 frames. ], tot_loss[loss=0.2995, ctc_loss=0.2184, cr_loss=0.4056, over 3360276.52 frames. ], batch size: 49, lr: 2.29e-02, grad_scale: 32.0 2024-09-22 19:02:53,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=81592.0, ans=0.025 2024-09-22 19:02:53,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=81592.0, ans=0.0 2024-09-22 19:03:23,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=81685.33333333333, ans=0.125 2024-09-22 19:03:36,543 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.294e+02 1.506e+02 1.844e+02 2.270e+02 3.771e+02, threshold=3.688e+02, percent-clipped=2.0 2024-09-22 19:03:49,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=81732.0, ans=0.125 2024-09-22 19:04:03,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=81778.66666666667, ans=0.125 2024-09-22 19:04:03,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81778.66666666667, ans=0.1 2024-09-22 19:04:10,004 INFO [train.py:1198] (3/4) Epoch 5, batch 1950, loss[loss=0.3044, ctc_loss=0.2257, cr_loss=0.3936, over 16011.00 frames. ], tot_loss[loss=0.2994, ctc_loss=0.2184, cr_loss=0.4052, over 3362826.74 frames. ], batch size: 74, lr: 2.29e-02, grad_scale: 32.0 2024-09-22 19:04:23,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=81825.33333333333, ans=0.025 2024-09-22 19:04:26,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=81872.0, ans=0.2 2024-09-22 19:04:47,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=81918.66666666667, ans=0.125 2024-09-22 19:05:00,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=81965.33333333333, ans=0.1 2024-09-22 19:05:30,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=82012.0, ans=0.2 2024-09-22 19:05:34,771 INFO [train.py:1198] (3/4) Epoch 5, batch 2000, loss[loss=0.2657, ctc_loss=0.1909, cr_loss=0.3736, over 17162.00 frames. ], tot_loss[loss=0.2989, ctc_loss=0.2179, cr_loss=0.4049, over 3363234.43 frames. ], batch size: 41, lr: 2.29e-02, grad_scale: 32.0 2024-09-22 19:05:45,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=82058.66666666667, ans=15.0 2024-09-22 19:06:21,338 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.273e+02 1.502e+02 1.753e+02 2.170e+02 3.000e+02, threshold=3.507e+02, percent-clipped=0.0 2024-09-22 19:06:39,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=82245.33333333333, ans=0.125 2024-09-22 19:06:42,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=82245.33333333333, ans=0.0 2024-09-22 19:06:48,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=82245.33333333333, ans=0.0 2024-09-22 19:06:54,494 INFO [train.py:1198] (3/4) Epoch 5, batch 2050, loss[loss=0.2989, ctc_loss=0.2234, cr_loss=0.3773, over 17108.00 frames. ], tot_loss[loss=0.3, ctc_loss=0.219, cr_loss=0.4052, over 3350718.96 frames. ], batch size: 40, lr: 2.28e-02, grad_scale: 32.0 2024-09-22 19:07:02,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2024-09-22 19:07:05,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=82292.0, ans=0.125 2024-09-22 19:07:13,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=82338.66666666667, ans=0.125 2024-09-22 19:07:14,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=82338.66666666667, ans=0.125 2024-09-22 19:07:41,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=12.0 2024-09-22 19:07:45,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=82432.0, ans=0.05 2024-09-22 19:07:56,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=82432.0, ans=0.2 2024-09-22 19:08:18,628 INFO [train.py:1198] (3/4) Epoch 5, batch 2100, loss[loss=0.3066, ctc_loss=0.2246, cr_loss=0.4099, over 17295.00 frames. ], tot_loss[loss=0.2994, ctc_loss=0.2183, cr_loss=0.4054, over 3356071.62 frames. ], batch size: 49, lr: 2.28e-02, grad_scale: 32.0 2024-09-22 19:08:36,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=82572.0, ans=0.125 2024-09-22 19:08:38,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2024-09-22 19:08:45,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=82572.0, ans=0.125 2024-09-22 19:09:04,411 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.206e+02 1.583e+02 1.843e+02 2.179e+02 3.617e+02, threshold=3.686e+02, percent-clipped=2.0 2024-09-22 19:09:40,265 INFO [train.py:1198] (3/4) Epoch 5, batch 2150, loss[loss=0.314, ctc_loss=0.2296, cr_loss=0.422, over 17095.00 frames. ], tot_loss[loss=0.299, ctc_loss=0.2178, cr_loss=0.4058, over 3359180.53 frames. ], batch size: 49, lr: 2.28e-02, grad_scale: 32.0 2024-09-22 19:10:07,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=82805.33333333333, ans=0.125 2024-09-22 19:10:25,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=82852.0, ans=0.125 2024-09-22 19:10:34,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=82898.66666666667, ans=15.0 2024-09-22 19:10:48,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=82945.33333333333, ans=0.0 2024-09-22 19:11:02,150 INFO [train.py:1198] (3/4) Epoch 5, batch 2200, loss[loss=0.2761, ctc_loss=0.2029, cr_loss=0.3661, over 17244.00 frames. ], tot_loss[loss=0.2999, ctc_loss=0.2185, cr_loss=0.407, over 3360096.98 frames. ], batch size: 44, lr: 2.28e-02, grad_scale: 32.0 2024-09-22 19:11:04,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=82992.0, ans=0.125 2024-09-22 19:11:34,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=83085.33333333333, ans=0.125 2024-09-22 19:11:48,592 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.270e+02 1.603e+02 1.776e+02 2.386e+02 3.569e+02, threshold=3.552e+02, percent-clipped=0.0 2024-09-22 19:11:48,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=83132.0, ans=0.0 2024-09-22 19:11:48,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=83132.0, ans=0.125 2024-09-22 19:11:56,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=83132.0, ans=0.1 2024-09-22 19:12:18,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=83178.66666666667, ans=0.0 2024-09-22 19:12:24,735 INFO [train.py:1198] (3/4) Epoch 5, batch 2250, loss[loss=0.3036, ctc_loss=0.2221, cr_loss=0.4078, over 16979.00 frames. ], tot_loss[loss=0.3008, ctc_loss=0.2193, cr_loss=0.4075, over 3339995.84 frames. ], batch size: 56, lr: 2.27e-02, grad_scale: 32.0 2024-09-22 19:13:46,567 INFO [train.py:1198] (3/4) Epoch 5, batch 2300, loss[loss=0.3053, ctc_loss=0.2281, cr_loss=0.3857, over 17218.00 frames. ], tot_loss[loss=0.3009, ctc_loss=0.2193, cr_loss=0.4078, over 3343030.17 frames. ], batch size: 50, lr: 2.27e-02, grad_scale: 32.0 2024-09-22 19:13:46,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=83458.66666666667, ans=0.0 2024-09-22 19:13:48,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=83458.66666666667, ans=0.125 2024-09-22 19:13:51,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=83458.66666666667, ans=0.125 2024-09-22 19:14:13,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=83505.33333333333, ans=0.2 2024-09-22 19:14:34,650 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.240e+02 1.515e+02 1.751e+02 2.046e+02 3.052e+02, threshold=3.503e+02, percent-clipped=0.0 2024-09-22 19:14:44,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=83598.66666666667, ans=0.125 2024-09-22 19:15:07,864 INFO [train.py:1198] (3/4) Epoch 5, batch 2350, loss[loss=0.3198, ctc_loss=0.2357, cr_loss=0.4204, over 17073.00 frames. ], tot_loss[loss=0.3013, ctc_loss=0.2198, cr_loss=0.4078, over 3350937.62 frames. ], batch size: 46, lr: 2.27e-02, grad_scale: 32.0 2024-09-22 19:15:38,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=83738.66666666667, ans=0.0 2024-09-22 19:16:08,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=83832.0, ans=0.0 2024-09-22 19:16:13,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=12.0 2024-09-22 19:16:18,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=12.0 2024-09-22 19:16:30,651 INFO [train.py:1198] (3/4) Epoch 5, batch 2400, loss[loss=0.3303, ctc_loss=0.2434, cr_loss=0.4343, over 16577.00 frames. ], tot_loss[loss=0.3017, ctc_loss=0.22, cr_loss=0.4084, over 3358843.93 frames. ], batch size: 66, lr: 2.27e-02, grad_scale: 32.0 2024-09-22 19:17:16,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=84018.66666666667, ans=0.125 2024-09-22 19:17:19,652 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.205e+02 1.495e+02 1.658e+02 1.967e+02 2.763e+02, threshold=3.315e+02, percent-clipped=0.0 2024-09-22 19:17:20,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=84065.33333333333, ans=0.125 2024-09-22 19:17:23,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84065.33333333333, ans=0.1 2024-09-22 19:17:23,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2024-09-22 19:17:33,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=84065.33333333333, ans=0.025 2024-09-22 19:17:33,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=84065.33333333333, ans=0.125 2024-09-22 19:17:49,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=84112.0, ans=0.125 2024-09-22 19:17:51,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-09-22 19:17:54,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=84158.66666666667, ans=0.1 2024-09-22 19:17:55,427 INFO [train.py:1198] (3/4) Epoch 5, batch 2450, loss[loss=0.3315, ctc_loss=0.2415, cr_loss=0.4496, over 16900.00 frames. ], tot_loss[loss=0.3007, ctc_loss=0.2191, cr_loss=0.408, over 3362021.07 frames. ], batch size: 58, lr: 2.26e-02, grad_scale: 32.0 2024-09-22 19:18:11,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=84205.33333333333, ans=0.0 2024-09-22 19:18:24,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.80 vs. limit=12.0 2024-09-22 19:18:32,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=84252.0, ans=0.5 2024-09-22 19:18:43,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=84298.66666666667, ans=0.125 2024-09-22 19:19:01,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=84345.33333333333, ans=0.125 2024-09-22 19:19:15,416 INFO [train.py:1198] (3/4) Epoch 5, batch 2500, loss[loss=0.3209, ctc_loss=0.239, cr_loss=0.4099, over 17363.00 frames. ], tot_loss[loss=0.2987, ctc_loss=0.2175, cr_loss=0.4056, over 3359211.48 frames. ], batch size: 48, lr: 2.26e-02, grad_scale: 32.0 2024-09-22 19:19:37,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84438.66666666667, ans=0.1 2024-09-22 19:20:04,285 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.72 vs. limit=10.0 2024-09-22 19:20:04,526 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.268e+02 1.488e+02 1.690e+02 1.912e+02 3.424e+02, threshold=3.381e+02, percent-clipped=1.0 2024-09-22 19:20:06,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=84532.0, ans=0.125 2024-09-22 19:20:20,569 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-22 19:20:23,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=84578.66666666667, ans=0.125 2024-09-22 19:20:40,873 INFO [train.py:1198] (3/4) Epoch 5, batch 2550, loss[loss=0.2881, ctc_loss=0.207, cr_loss=0.4052, over 17134.00 frames. ], tot_loss[loss=0.2984, ctc_loss=0.2172, cr_loss=0.406, over 3362094.53 frames. ], batch size: 48, lr: 2.26e-02, grad_scale: 32.0 2024-09-22 19:20:41,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84625.33333333333, ans=0.1 2024-09-22 19:20:46,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=15.0 2024-09-22 19:20:52,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=84625.33333333333, ans=0.2 2024-09-22 19:21:14,835 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 19:21:16,950 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2024-09-22 19:21:18,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.07 vs. limit=22.5 2024-09-22 19:22:03,011 INFO [train.py:1198] (3/4) Epoch 5, batch 2600, loss[loss=0.3627, ctc_loss=0.2726, cr_loss=0.4504, over 15012.00 frames. ], tot_loss[loss=0.2971, ctc_loss=0.2161, cr_loss=0.405, over 3357746.62 frames. ], batch size: 89, lr: 2.25e-02, grad_scale: 32.0 2024-09-22 19:22:11,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=84858.66666666667, ans=0.125 2024-09-22 19:22:12,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=84858.66666666667, ans=0.125 2024-09-22 19:22:16,465 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2024-09-22 19:22:18,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.52 vs. limit=22.5 2024-09-22 19:22:19,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=84905.33333333333, ans=0.125 2024-09-22 19:22:19,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=6.0 2024-09-22 19:22:23,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=84905.33333333333, ans=0.1 2024-09-22 19:22:40,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=84952.0, ans=0.125 2024-09-22 19:22:51,551 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.221e+02 1.583e+02 1.835e+02 2.132e+02 3.981e+02, threshold=3.669e+02, percent-clipped=1.0 2024-09-22 19:23:11,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2024-09-22 19:23:24,865 INFO [train.py:1198] (3/4) Epoch 5, batch 2650, loss[loss=0.3491, ctc_loss=0.2587, cr_loss=0.4516, over 16211.00 frames. ], tot_loss[loss=0.2983, ctc_loss=0.217, cr_loss=0.4069, over 3363172.37 frames. ], batch size: 74, lr: 2.25e-02, grad_scale: 32.0 2024-09-22 19:23:29,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=85092.0, ans=0.125 2024-09-22 19:23:34,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=85092.0, ans=0.1 2024-09-22 19:23:49,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2024-09-22 19:23:52,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=85138.66666666667, ans=0.0 2024-09-22 19:23:53,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=85138.66666666667, ans=10.0 2024-09-22 19:24:04,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=85185.33333333333, ans=0.2 2024-09-22 19:24:46,096 INFO [train.py:1198] (3/4) Epoch 5, batch 2700, loss[loss=0.3162, ctc_loss=0.2372, cr_loss=0.395, over 16820.00 frames. ], tot_loss[loss=0.2974, ctc_loss=0.2164, cr_loss=0.4054, over 3360906.19 frames. ], batch size: 58, lr: 2.25e-02, grad_scale: 32.0 2024-09-22 19:24:47,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2024-09-22 19:25:02,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2024-09-22 19:25:05,741 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.31 vs. limit=12.0 2024-09-22 19:25:27,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=15.0 2024-09-22 19:25:34,610 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.200e+02 1.500e+02 1.639e+02 1.809e+02 2.394e+02, threshold=3.278e+02, percent-clipped=0.0 2024-09-22 19:25:38,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.88 vs. limit=10.0 2024-09-22 19:25:47,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=85465.33333333333, ans=0.0 2024-09-22 19:25:56,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.27 vs. limit=10.0 2024-09-22 19:26:05,583 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.34 vs. limit=10.0 2024-09-22 19:26:08,132 INFO [train.py:1198] (3/4) Epoch 5, batch 2750, loss[loss=0.3448, ctc_loss=0.2651, cr_loss=0.3985, over 11596.00 frames. ], tot_loss[loss=0.2965, ctc_loss=0.2157, cr_loss=0.4042, over 3358977.54 frames. ], batch size: 125, lr: 2.25e-02, grad_scale: 32.0 2024-09-22 19:26:13,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=85558.66666666667, ans=0.125 2024-09-22 19:26:36,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=85605.33333333333, ans=0.125 2024-09-22 19:26:38,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=85652.0, ans=0.125 2024-09-22 19:27:31,980 INFO [train.py:1198] (3/4) Epoch 5, batch 2800, loss[loss=0.2755, ctc_loss=0.1965, cr_loss=0.3954, over 17244.00 frames. ], tot_loss[loss=0.2975, ctc_loss=0.2163, cr_loss=0.406, over 3355553.64 frames. ], batch size: 44, lr: 2.24e-02, grad_scale: 32.0 2024-09-22 19:27:45,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=85792.0, ans=0.0 2024-09-22 19:27:49,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=85838.66666666667, ans=0.0 2024-09-22 19:27:49,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=85838.66666666667, ans=0.1 2024-09-22 19:28:01,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=85838.66666666667, ans=0.0 2024-09-22 19:28:04,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=85885.33333333333, ans=0.2 2024-09-22 19:28:18,241 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.220e+02 1.484e+02 1.665e+02 1.912e+02 3.153e+02, threshold=3.329e+02, percent-clipped=0.0 2024-09-22 19:28:31,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=85932.0, ans=0.035 2024-09-22 19:28:51,700 INFO [train.py:1198] (3/4) Epoch 5, batch 2850, loss[loss=0.3176, ctc_loss=0.2344, cr_loss=0.4162, over 14796.00 frames. ], tot_loss[loss=0.2972, ctc_loss=0.2161, cr_loss=0.4057, over 3353091.42 frames. ], batch size: 89, lr: 2.24e-02, grad_scale: 32.0 2024-09-22 19:28:58,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=86025.33333333333, ans=0.2 2024-09-22 19:29:16,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=86072.0, ans=0.025 2024-09-22 19:29:37,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=86118.66666666667, ans=0.125 2024-09-22 19:29:49,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=86165.33333333333, ans=0.1 2024-09-22 19:29:56,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=86212.0, ans=0.125 2024-09-22 19:30:16,021 INFO [train.py:1198] (3/4) Epoch 5, batch 2900, loss[loss=0.2771, ctc_loss=0.1989, cr_loss=0.3912, over 17275.00 frames. ], tot_loss[loss=0.2987, ctc_loss=0.2174, cr_loss=0.4067, over 3353430.55 frames. ], batch size: 44, lr: 2.24e-02, grad_scale: 32.0 2024-09-22 19:30:41,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=86305.33333333333, ans=0.0 2024-09-22 19:30:44,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=86305.33333333333, ans=0.125 2024-09-22 19:30:58,539 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.68 vs. limit=5.0 2024-09-22 19:31:01,934 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.291e+02 1.588e+02 1.741e+02 2.202e+02 4.410e+02, threshold=3.483e+02, percent-clipped=1.0 2024-09-22 19:31:04,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-09-22 19:31:15,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-09-22 19:31:32,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=86445.33333333333, ans=0.0 2024-09-22 19:31:35,100 INFO [train.py:1198] (3/4) Epoch 5, batch 2950, loss[loss=0.2615, ctc_loss=0.1885, cr_loss=0.365, over 17179.00 frames. ], tot_loss[loss=0.2999, ctc_loss=0.2185, cr_loss=0.4071, over 3344029.65 frames. ], batch size: 41, lr: 2.24e-02, grad_scale: 32.0 2024-09-22 19:31:42,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=86492.0, ans=0.125 2024-09-22 19:31:57,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=86538.66666666667, ans=0.0 2024-09-22 19:32:37,684 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 19:32:39,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=86632.0, ans=0.2 2024-09-22 19:32:44,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.00 vs. limit=22.5 2024-09-22 19:32:51,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=86678.66666666667, ans=0.125 2024-09-22 19:32:59,037 INFO [train.py:1198] (3/4) Epoch 5, batch 3000, loss[loss=0.4063, ctc_loss=0.3096, cr_loss=0.4839, over 11887.00 frames. ], tot_loss[loss=0.3003, ctc_loss=0.2188, cr_loss=0.4072, over 3342983.85 frames. ], batch size: 123, lr: 2.23e-02, grad_scale: 32.0 2024-09-22 19:32:59,037 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 19:33:14,593 INFO [train.py:1230] (3/4) Epoch 5, validation: loss=0.06642, ctc_loss=0.06642, cr_loss=7.381e-15, over 944034.00 frames. 2024-09-22 19:33:14,594 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-22 19:33:15,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.11 vs. limit=22.5 2024-09-22 19:33:19,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=86725.33333333333, ans=0.2 2024-09-22 19:33:24,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=86725.33333333333, ans=10.0 2024-09-22 19:33:33,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=86772.0, ans=0.125 2024-09-22 19:33:38,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=86772.0, ans=0.1 2024-09-22 19:33:54,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=86818.66666666667, ans=0.125 2024-09-22 19:33:57,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=22.5 2024-09-22 19:34:00,035 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.256e+02 1.524e+02 1.804e+02 2.236e+02 6.139e+02, threshold=3.607e+02, percent-clipped=4.0 2024-09-22 19:34:00,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=86865.33333333333, ans=0.125 2024-09-22 19:34:17,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=86912.0, ans=0.125 2024-09-22 19:34:17,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=22.5 2024-09-22 19:34:27,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=86912.0, ans=0.0 2024-09-22 19:34:35,393 INFO [train.py:1198] (3/4) Epoch 5, batch 3050, loss[loss=0.3205, ctc_loss=0.2358, cr_loss=0.4232, over 17032.00 frames. ], tot_loss[loss=0.3007, ctc_loss=0.2192, cr_loss=0.4073, over 3334154.73 frames. ], batch size: 52, lr: 2.23e-02, grad_scale: 32.0 2024-09-22 19:34:38,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=86958.66666666667, ans=0.1 2024-09-22 19:34:41,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=86958.66666666667, ans=0.0 2024-09-22 19:34:58,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=87005.33333333333, ans=0.125 2024-09-22 19:35:53,029 INFO [train.py:1198] (3/4) Epoch 5, batch 3100, loss[loss=0.2661, ctc_loss=0.1908, cr_loss=0.376, over 17033.00 frames. ], tot_loss[loss=0.2992, ctc_loss=0.2179, cr_loss=0.4066, over 3345781.38 frames. ], batch size: 39, lr: 2.23e-02, grad_scale: 64.0 2024-09-22 19:35:58,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=87192.0, ans=0.95 2024-09-22 19:36:01,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=87192.0, ans=0.2 2024-09-22 19:36:04,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=87192.0, ans=0.0 2024-09-22 19:36:20,136 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.41 vs. limit=22.5 2024-09-22 19:36:37,948 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.213e+02 1.555e+02 1.774e+02 2.101e+02 3.567e+02, threshold=3.549e+02, percent-clipped=0.0 2024-09-22 19:36:53,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.44 vs. limit=15.0 2024-09-22 19:37:13,093 INFO [train.py:1198] (3/4) Epoch 5, batch 3150, loss[loss=0.3466, ctc_loss=0.2593, cr_loss=0.4369, over 17301.00 frames. ], tot_loss[loss=0.2991, ctc_loss=0.2176, cr_loss=0.4077, over 3355910.06 frames. ], batch size: 51, lr: 2.23e-02, grad_scale: 64.0 2024-09-22 19:37:55,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=87518.66666666667, ans=0.125 2024-09-22 19:38:00,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=87565.33333333333, ans=0.1 2024-09-22 19:38:03,636 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 19:38:16,796 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=22.5 2024-09-22 19:38:23,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=87612.0, ans=0.125 2024-09-22 19:38:31,403 INFO [train.py:1198] (3/4) Epoch 5, batch 3200, loss[loss=0.3273, ctc_loss=0.2429, cr_loss=0.422, over 17036.00 frames. ], tot_loss[loss=0.2982, ctc_loss=0.2168, cr_loss=0.4069, over 3356788.89 frames. ], batch size: 52, lr: 2.22e-02, grad_scale: 32.0 2024-09-22 19:38:34,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=87658.66666666667, ans=0.125 2024-09-22 19:38:45,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=87705.33333333333, ans=0.125 2024-09-22 19:38:48,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=87705.33333333333, ans=0.125 2024-09-22 19:38:49,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-09-22 19:38:56,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=87705.33333333333, ans=0.125 2024-09-22 19:39:02,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=87752.0, ans=0.125 2024-09-22 19:39:18,460 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.243e+02 1.520e+02 1.734e+02 1.974e+02 3.517e+02, threshold=3.467e+02, percent-clipped=0.0 2024-09-22 19:39:40,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=87845.33333333333, ans=0.2 2024-09-22 19:39:49,579 INFO [train.py:1198] (3/4) Epoch 5, batch 3250, loss[loss=0.3484, ctc_loss=0.2607, cr_loss=0.4387, over 15243.00 frames. ], tot_loss[loss=0.298, ctc_loss=0.2167, cr_loss=0.4068, over 3352782.54 frames. ], batch size: 89, lr: 2.22e-02, grad_scale: 32.0 2024-09-22 19:40:21,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=87985.33333333333, ans=0.0 2024-09-22 19:40:31,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=87985.33333333333, ans=0.0 2024-09-22 19:40:35,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=88032.0, ans=0.09899494936611666 2024-09-22 19:41:09,581 INFO [train.py:1198] (3/4) Epoch 5, batch 3300, loss[loss=0.319, ctc_loss=0.2375, cr_loss=0.4074, over 17007.00 frames. ], tot_loss[loss=0.2991, ctc_loss=0.2175, cr_loss=0.4079, over 3342710.30 frames. ], batch size: 51, lr: 2.22e-02, grad_scale: 32.0 2024-09-22 19:41:27,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=88172.0, ans=0.125 2024-09-22 19:41:43,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=88218.66666666667, ans=0.125 2024-09-22 19:41:58,205 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.266e+02 1.544e+02 1.772e+02 2.234e+02 4.094e+02, threshold=3.543e+02, percent-clipped=4.0 2024-09-22 19:42:07,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=88265.33333333333, ans=0.05 2024-09-22 19:42:10,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=88265.33333333333, ans=0.1 2024-09-22 19:42:16,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=88312.0, ans=0.05 2024-09-22 19:42:18,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=88312.0, ans=0.1 2024-09-22 19:42:29,377 INFO [train.py:1198] (3/4) Epoch 5, batch 3350, loss[loss=0.3124, ctc_loss=0.2299, cr_loss=0.4127, over 17018.00 frames. ], tot_loss[loss=0.2998, ctc_loss=0.2182, cr_loss=0.4077, over 3348607.49 frames. ], batch size: 51, lr: 2.22e-02, grad_scale: 32.0 2024-09-22 19:42:31,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.98 vs. limit=15.0 2024-09-22 19:43:05,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=88452.0, ans=0.0 2024-09-22 19:43:33,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=88545.33333333333, ans=0.125 2024-09-22 19:43:37,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2024-09-22 19:43:38,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88545.33333333333, ans=0.1 2024-09-22 19:43:47,477 INFO [train.py:1198] (3/4) Epoch 5, batch 3400, loss[loss=0.2563, ctc_loss=0.1835, cr_loss=0.3638, over 16315.00 frames. ], tot_loss[loss=0.2972, ctc_loss=0.216, cr_loss=0.4061, over 3360119.60 frames. ], batch size: 36, lr: 2.21e-02, grad_scale: 32.0 2024-09-22 19:44:05,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=88638.66666666667, ans=0.0 2024-09-22 19:44:10,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=88638.66666666667, ans=0.125 2024-09-22 19:44:17,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=88685.33333333333, ans=0.125 2024-09-22 19:44:22,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=88685.33333333333, ans=0.125 2024-09-22 19:44:22,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=88685.33333333333, ans=0.1 2024-09-22 19:44:28,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=88685.33333333333, ans=0.0 2024-09-22 19:44:30,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=88685.33333333333, ans=0.125 2024-09-22 19:44:34,720 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.286e+02 1.562e+02 1.776e+02 2.153e+02 3.268e+02, threshold=3.552e+02, percent-clipped=0.0 2024-09-22 19:45:05,687 INFO [train.py:1198] (3/4) Epoch 5, batch 3450, loss[loss=0.2927, ctc_loss=0.2074, cr_loss=0.4263, over 17038.00 frames. ], tot_loss[loss=0.2975, ctc_loss=0.2162, cr_loss=0.4066, over 3359255.93 frames. ], batch size: 56, lr: 2.21e-02, grad_scale: 32.0 2024-09-22 19:45:07,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=88825.33333333333, ans=0.125 2024-09-22 19:45:14,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=88825.33333333333, ans=0.0 2024-09-22 19:45:45,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=88918.66666666667, ans=0.0 2024-09-22 19:46:25,705 INFO [train.py:1198] (3/4) Epoch 5, batch 3500, loss[loss=0.2464, ctc_loss=0.1769, cr_loss=0.3472, over 17274.00 frames. ], tot_loss[loss=0.2963, ctc_loss=0.2152, cr_loss=0.4054, over 3368962.45 frames. ], batch size: 42, lr: 2.21e-02, grad_scale: 32.0 2024-09-22 19:46:41,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=89105.33333333333, ans=0.1 2024-09-22 19:46:49,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=89105.33333333333, ans=0.0 2024-09-22 19:46:58,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89152.0, ans=0.1 2024-09-22 19:47:07,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2024-09-22 19:47:11,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=89152.0, ans=0.125 2024-09-22 19:47:12,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=89198.66666666667, ans=0.125 2024-09-22 19:47:14,315 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.466e+02 1.595e+02 1.829e+02 3.245e+02, threshold=3.189e+02, percent-clipped=0.0 2024-09-22 19:47:14,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=89198.66666666667, ans=0.025 2024-09-22 19:47:22,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89198.66666666667, ans=0.1 2024-09-22 19:47:40,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=89245.33333333333, ans=0.125 2024-09-22 19:47:45,032 INFO [train.py:1198] (3/4) Epoch 5, batch 3550, loss[loss=0.2695, ctc_loss=0.1943, cr_loss=0.3757, over 17075.00 frames. ], tot_loss[loss=0.2958, ctc_loss=0.2148, cr_loss=0.405, over 3370792.62 frames. ], batch size: 46, lr: 2.21e-02, grad_scale: 32.0 2024-09-22 19:47:50,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=89292.0, ans=0.025 2024-09-22 19:47:54,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=89292.0, ans=0.2 2024-09-22 19:48:05,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=89338.66666666667, ans=0.0 2024-09-22 19:48:14,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.98 vs. limit=12.0 2024-09-22 19:48:19,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89385.33333333333, ans=0.1 2024-09-22 19:48:25,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=89385.33333333333, ans=0.0 2024-09-22 19:48:30,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=89432.0, ans=0.125 2024-09-22 19:48:49,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=89478.66666666667, ans=0.125 2024-09-22 19:49:02,837 INFO [train.py:1198] (3/4) Epoch 5, batch 3600, loss[loss=0.3102, ctc_loss=0.2262, cr_loss=0.4201, over 17023.00 frames. ], tot_loss[loss=0.2956, ctc_loss=0.2147, cr_loss=0.4046, over 3363679.20 frames. ], batch size: 56, lr: 2.20e-02, grad_scale: 32.0 2024-09-22 19:49:22,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=22.5 2024-09-22 19:49:25,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.89 vs. limit=15.0 2024-09-22 19:49:36,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=89618.66666666667, ans=0.125 2024-09-22 19:49:39,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.15 vs. limit=22.5 2024-09-22 19:49:48,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.97 vs. limit=12.0 2024-09-22 19:49:49,381 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.445e+02 1.594e+02 1.731e+02 2.971e+02, threshold=3.187e+02, percent-clipped=0.0 2024-09-22 19:50:14,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=89712.0, ans=0.025 2024-09-22 19:50:22,565 INFO [train.py:1198] (3/4) Epoch 5, batch 3650, loss[loss=0.3506, ctc_loss=0.2623, cr_loss=0.4416, over 14812.00 frames. ], tot_loss[loss=0.2964, ctc_loss=0.2153, cr_loss=0.4053, over 3352333.53 frames. ], batch size: 89, lr: 2.20e-02, grad_scale: 32.0 2024-09-22 19:50:25,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.55 vs. limit=15.0 2024-09-22 19:50:33,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89758.66666666667, ans=0.1 2024-09-22 19:50:58,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=89852.0, ans=0.125 2024-09-22 19:51:28,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=89945.33333333333, ans=0.2 2024-09-22 19:51:35,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=89945.33333333333, ans=0.125 2024-09-22 19:51:38,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=89945.33333333333, ans=0.07 2024-09-22 19:51:43,157 INFO [train.py:1198] (3/4) Epoch 5, batch 3700, loss[loss=0.2665, ctc_loss=0.1882, cr_loss=0.3915, over 16952.00 frames. ], tot_loss[loss=0.296, ctc_loss=0.215, cr_loss=0.4053, over 3352039.35 frames. ], batch size: 42, lr: 2.20e-02, grad_scale: 32.0 2024-09-22 19:51:50,173 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.72 vs. limit=15.0 2024-09-22 19:52:02,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.01 vs. limit=22.5 2024-09-22 19:52:14,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=90085.33333333333, ans=0.2 2024-09-22 19:52:28,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=90132.0, ans=0.025 2024-09-22 19:52:29,544 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.256e+02 1.562e+02 1.758e+02 2.028e+02 3.638e+02, threshold=3.517e+02, percent-clipped=1.0 2024-09-22 19:52:34,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=90132.0, ans=0.0 2024-09-22 19:52:43,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=90178.66666666667, ans=0.2 2024-09-22 19:52:59,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=90225.33333333333, ans=0.0 2024-09-22 19:53:01,155 INFO [train.py:1198] (3/4) Epoch 5, batch 3750, loss[loss=0.2952, ctc_loss=0.2097, cr_loss=0.4275, over 17016.00 frames. ], tot_loss[loss=0.2958, ctc_loss=0.2148, cr_loss=0.405, over 3348919.41 frames. ], batch size: 44, lr: 2.20e-02, grad_scale: 32.0 2024-09-22 19:53:23,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=90272.0, ans=0.2 2024-09-22 19:53:31,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=90318.66666666667, ans=0.0 2024-09-22 19:53:33,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=12.0 2024-09-22 19:53:38,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=90318.66666666667, ans=0.125 2024-09-22 19:53:44,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=90318.66666666667, ans=0.125 2024-09-22 19:54:09,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=90412.0, ans=0.125 2024-09-22 19:54:19,783 INFO [train.py:1198] (3/4) Epoch 5, batch 3800, loss[loss=0.2644, ctc_loss=0.192, cr_loss=0.3619, over 16345.00 frames. ], tot_loss[loss=0.2956, ctc_loss=0.2148, cr_loss=0.404, over 3338978.98 frames. ], batch size: 36, lr: 2.19e-02, grad_scale: 32.0 2024-09-22 19:54:24,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=90458.66666666667, ans=0.125 2024-09-22 19:54:54,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=90552.0, ans=0.0 2024-09-22 19:54:56,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=90552.0, ans=0.1 2024-09-22 19:55:06,884 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.208e+02 1.578e+02 1.852e+02 2.152e+02 4.120e+02, threshold=3.704e+02, percent-clipped=2.0 2024-09-22 19:55:21,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=90645.33333333333, ans=0.125 2024-09-22 19:55:22,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=90645.33333333333, ans=0.0 2024-09-22 19:55:38,742 INFO [train.py:1198] (3/4) Epoch 5, batch 3850, loss[loss=0.2967, ctc_loss=0.2115, cr_loss=0.4264, over 17301.00 frames. ], tot_loss[loss=0.2985, ctc_loss=0.2175, cr_loss=0.4052, over 3299879.82 frames. ], batch size: 51, lr: 2.19e-02, grad_scale: 32.0 2024-09-22 19:55:42,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=90692.0, ans=0.1 2024-09-22 19:55:42,231 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 19:55:54,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=90738.66666666667, ans=0.125 2024-09-22 19:56:11,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2024-09-22 19:56:36,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=90832.0, ans=0.125 2024-09-22 19:57:39,903 INFO [train.py:1198] (3/4) Epoch 6, batch 0, loss[loss=0.3532, ctc_loss=0.2647, cr_loss=0.4421, over 17302.00 frames. ], tot_loss[loss=0.3532, ctc_loss=0.2647, cr_loss=0.4421, over 17302.00 frames. ], batch size: 51, lr: 2.04e-02, grad_scale: 32.0 2024-09-22 19:57:39,904 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 19:57:55,107 INFO [train.py:1230] (3/4) Epoch 6, validation: loss=0.06886, ctc_loss=0.06886, cr_loss=9.986e-15, over 944034.00 frames. 2024-09-22 19:57:55,108 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-22 19:58:13,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=90953.33333333333, ans=0.125 2024-09-22 19:58:18,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=90953.33333333333, ans=0.2 2024-09-22 19:58:30,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=91000.0, ans=0.1 2024-09-22 19:58:38,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=91000.0, ans=0.125 2024-09-22 19:58:48,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=91046.66666666667, ans=0.2 2024-09-22 19:58:50,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=91046.66666666667, ans=0.2 2024-09-22 19:58:51,239 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.268e+02 1.570e+02 1.853e+02 2.174e+02 4.194e+02, threshold=3.706e+02, percent-clipped=2.0 2024-09-22 19:58:53,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=91046.66666666667, ans=0.125 2024-09-22 19:59:06,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=91093.33333333333, ans=0.1 2024-09-22 19:59:19,096 INFO [train.py:1198] (3/4) Epoch 6, batch 50, loss[loss=0.2488, ctc_loss=0.1797, cr_loss=0.3454, over 17180.00 frames. ], tot_loss[loss=0.3012, ctc_loss=0.2201, cr_loss=0.4053, over 749255.14 frames. ], batch size: 41, lr: 2.04e-02, grad_scale: 32.0 2024-09-22 19:59:20,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=91140.0, ans=0.125 2024-09-22 19:59:32,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=91140.0, ans=0.125 2024-09-22 19:59:43,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=91186.66666666667, ans=0.125 2024-09-22 19:59:48,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=91186.66666666667, ans=0.1 2024-09-22 19:59:55,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=22.5 2024-09-22 20:00:03,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=91233.33333333333, ans=0.125 2024-09-22 20:00:15,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=91280.0, ans=0.125 2024-09-22 20:00:41,136 INFO [train.py:1198] (3/4) Epoch 6, batch 100, loss[loss=0.2471, ctc_loss=0.173, cr_loss=0.3703, over 17121.00 frames. ], tot_loss[loss=0.2932, ctc_loss=0.2128, cr_loss=0.4022, over 1338208.37 frames. ], batch size: 40, lr: 2.04e-02, grad_scale: 32.0 2024-09-22 20:01:02,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=91420.0, ans=0.125 2024-09-22 20:01:27,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=91513.33333333333, ans=0.125 2024-09-22 20:01:35,327 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.210e+02 1.399e+02 1.624e+02 1.941e+02 3.446e+02, threshold=3.247e+02, percent-clipped=0.0 2024-09-22 20:01:42,134 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 20:01:53,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.25 vs. limit=10.0 2024-09-22 20:01:54,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=91560.0, ans=0.0 2024-09-22 20:02:00,846 INFO [train.py:1198] (3/4) Epoch 6, batch 150, loss[loss=0.2713, ctc_loss=0.1969, cr_loss=0.3723, over 16724.00 frames. ], tot_loss[loss=0.2907, ctc_loss=0.2104, cr_loss=0.4015, over 1792455.90 frames. ], batch size: 61, lr: 2.04e-02, grad_scale: 32.0 2024-09-22 20:02:05,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=91606.66666666667, ans=0.0 2024-09-22 20:02:23,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=91653.33333333333, ans=0.1 2024-09-22 20:03:02,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=91746.66666666667, ans=0.2 2024-09-22 20:03:04,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=91746.66666666667, ans=0.125 2024-09-22 20:03:11,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91793.33333333333, ans=0.1 2024-09-22 20:03:25,937 INFO [train.py:1198] (3/4) Epoch 6, batch 200, loss[loss=0.3129, ctc_loss=0.228, cr_loss=0.4245, over 15097.00 frames. ], tot_loss[loss=0.2923, ctc_loss=0.2116, cr_loss=0.4033, over 2132257.57 frames. ], batch size: 89, lr: 2.03e-02, grad_scale: 32.0 2024-09-22 20:03:42,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=91886.66666666667, ans=0.0 2024-09-22 20:03:45,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=91886.66666666667, ans=0.125 2024-09-22 20:03:55,595 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=12.0 2024-09-22 20:04:09,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=91933.33333333333, ans=0.0 2024-09-22 20:04:25,624 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.298e+02 1.556e+02 1.826e+02 2.254e+02 3.362e+02, threshold=3.652e+02, percent-clipped=2.0 2024-09-22 20:04:26,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=91980.0, ans=0.2 2024-09-22 20:04:51,205 INFO [train.py:1198] (3/4) Epoch 6, batch 250, loss[loss=0.2954, ctc_loss=0.2128, cr_loss=0.413, over 17215.00 frames. ], tot_loss[loss=0.2932, ctc_loss=0.2124, cr_loss=0.4045, over 2396159.66 frames. ], batch size: 55, lr: 2.03e-02, grad_scale: 32.0 2024-09-22 20:04:53,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=92073.33333333333, ans=0.125 2024-09-22 20:04:54,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=92073.33333333333, ans=0.125 2024-09-22 20:05:10,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=92120.0, ans=0.125 2024-09-22 20:05:24,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2024-09-22 20:05:30,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=92166.66666666667, ans=0.2 2024-09-22 20:05:35,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=92166.66666666667, ans=0.0 2024-09-22 20:05:38,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=92213.33333333333, ans=0.125 2024-09-22 20:05:38,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=92213.33333333333, ans=0.125 2024-09-22 20:06:09,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=92306.66666666667, ans=0.125 2024-09-22 20:06:10,526 INFO [train.py:1198] (3/4) Epoch 6, batch 300, loss[loss=0.2992, ctc_loss=0.2186, cr_loss=0.4031, over 17185.00 frames. ], tot_loss[loss=0.2942, ctc_loss=0.213, cr_loss=0.4057, over 2607052.32 frames. ], batch size: 55, lr: 2.03e-02, grad_scale: 32.0 2024-09-22 20:06:17,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92306.66666666667, ans=0.1 2024-09-22 20:06:34,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92353.33333333333, ans=0.1 2024-09-22 20:06:34,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=92353.33333333333, ans=0.125 2024-09-22 20:06:52,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=92400.0, ans=0.05 2024-09-22 20:06:59,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=92446.66666666667, ans=0.2 2024-09-22 20:07:03,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=92446.66666666667, ans=0.2 2024-09-22 20:07:04,352 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.190e+02 1.501e+02 1.679e+02 1.995e+02 3.588e+02, threshold=3.358e+02, percent-clipped=0.0 2024-09-22 20:07:15,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=92493.33333333333, ans=0.0 2024-09-22 20:07:17,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=92493.33333333333, ans=0.0 2024-09-22 20:07:32,319 INFO [train.py:1198] (3/4) Epoch 6, batch 350, loss[loss=0.3071, ctc_loss=0.2219, cr_loss=0.426, over 16948.00 frames. ], tot_loss[loss=0.2943, ctc_loss=0.2133, cr_loss=0.405, over 2765846.18 frames. ], batch size: 58, lr: 2.03e-02, grad_scale: 32.0 2024-09-22 20:08:08,891 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 20:08:42,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=92726.66666666667, ans=0.0 2024-09-22 20:08:48,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=92726.66666666667, ans=0.125 2024-09-22 20:08:52,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=92726.66666666667, ans=0.0 2024-09-22 20:08:54,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.82 vs. limit=22.5 2024-09-22 20:08:57,316 INFO [train.py:1198] (3/4) Epoch 6, batch 400, loss[loss=0.2946, ctc_loss=0.2122, cr_loss=0.4125, over 17163.00 frames. ], tot_loss[loss=0.2928, ctc_loss=0.212, cr_loss=0.4041, over 2895235.72 frames. ], batch size: 45, lr: 2.02e-02, grad_scale: 32.0 2024-09-22 20:09:12,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=92820.0, ans=0.125 2024-09-22 20:09:15,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92820.0, ans=0.1 2024-09-22 20:09:30,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=92866.66666666667, ans=0.125 2024-09-22 20:09:48,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=92913.33333333333, ans=0.0 2024-09-22 20:09:54,060 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.272e+02 1.481e+02 1.649e+02 1.890e+02 2.985e+02, threshold=3.299e+02, percent-clipped=0.0 2024-09-22 20:10:19,709 INFO [train.py:1198] (3/4) Epoch 6, batch 450, loss[loss=0.3239, ctc_loss=0.2315, cr_loss=0.462, over 17001.00 frames. ], tot_loss[loss=0.2917, ctc_loss=0.211, cr_loss=0.4034, over 2996952.70 frames. ], batch size: 56, lr: 2.02e-02, grad_scale: 32.0 2024-09-22 20:10:20,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=93006.66666666667, ans=0.125 2024-09-22 20:10:34,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2024-09-22 20:10:44,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-09-22 20:10:51,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=93100.0, ans=0.125 2024-09-22 20:11:26,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=93193.33333333333, ans=0.0 2024-09-22 20:11:37,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=93240.0, ans=0.125 2024-09-22 20:11:38,118 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-09-22 20:11:39,002 INFO [train.py:1198] (3/4) Epoch 6, batch 500, loss[loss=0.3434, ctc_loss=0.2489, cr_loss=0.4726, over 14989.00 frames. ], tot_loss[loss=0.2914, ctc_loss=0.2106, cr_loss=0.4039, over 3079806.56 frames. ], batch size: 89, lr: 2.02e-02, grad_scale: 32.0 2024-09-22 20:11:43,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=93240.0, ans=0.2 2024-09-22 20:11:47,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=93240.0, ans=0.1 2024-09-22 20:11:47,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=15.0 2024-09-22 20:12:01,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=93286.66666666667, ans=0.0 2024-09-22 20:12:24,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=93333.33333333333, ans=15.0 2024-09-22 20:12:26,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93333.33333333333, ans=0.1 2024-09-22 20:12:34,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=93380.0, ans=0.125 2024-09-22 20:12:37,312 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.206e+02 1.459e+02 1.650e+02 1.980e+02 2.967e+02, threshold=3.301e+02, percent-clipped=0.0 2024-09-22 20:12:54,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=93426.66666666667, ans=0.025 2024-09-22 20:13:05,204 INFO [train.py:1198] (3/4) Epoch 6, batch 550, loss[loss=0.2589, ctc_loss=0.1875, cr_loss=0.3568, over 17113.00 frames. ], tot_loss[loss=0.2912, ctc_loss=0.2104, cr_loss=0.404, over 3141057.38 frames. ], batch size: 40, lr: 2.02e-02, grad_scale: 32.0 2024-09-22 20:13:40,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=93566.66666666667, ans=0.125 2024-09-22 20:13:52,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=93566.66666666667, ans=0.125 2024-09-22 20:14:23,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.51 vs. limit=15.0 2024-09-22 20:14:30,623 INFO [train.py:1198] (3/4) Epoch 6, batch 600, loss[loss=0.242, ctc_loss=0.1725, cr_loss=0.3475, over 17262.00 frames. ], tot_loss[loss=0.2912, ctc_loss=0.2106, cr_loss=0.4033, over 3175778.25 frames. ], batch size: 44, lr: 2.02e-02, grad_scale: 32.0 2024-09-22 20:14:40,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2024-09-22 20:14:43,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=93706.66666666667, ans=0.1 2024-09-22 20:15:09,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-09-22 20:15:13,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=93800.0, ans=0.0 2024-09-22 20:15:24,958 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.461e+02 1.595e+02 1.846e+02 3.407e+02, threshold=3.191e+02, percent-clipped=1.0 2024-09-22 20:15:50,711 INFO [train.py:1198] (3/4) Epoch 6, batch 650, loss[loss=0.277, ctc_loss=0.1956, cr_loss=0.4068, over 17288.00 frames. ], tot_loss[loss=0.2909, ctc_loss=0.2101, cr_loss=0.404, over 3226733.40 frames. ], batch size: 46, lr: 2.01e-02, grad_scale: 32.0 2024-09-22 20:16:03,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=93940.0, ans=0.07 2024-09-22 20:16:14,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=93986.66666666667, ans=0.125 2024-09-22 20:16:17,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=93986.66666666667, ans=0.125 2024-09-22 20:16:17,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=93986.66666666667, ans=0.125 2024-09-22 20:16:46,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=94080.0, ans=0.125 2024-09-22 20:16:58,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=22.5 2024-09-22 20:17:09,773 INFO [train.py:1198] (3/4) Epoch 6, batch 700, loss[loss=0.2347, ctc_loss=0.1688, cr_loss=0.3298, over 17137.00 frames. ], tot_loss[loss=0.2924, ctc_loss=0.2114, cr_loss=0.4049, over 3244456.36 frames. ], batch size: 40, lr: 2.01e-02, grad_scale: 32.0 2024-09-22 20:17:16,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=94173.33333333333, ans=0.125 2024-09-22 20:17:47,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2024-09-22 20:17:56,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2024-09-22 20:18:09,123 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.224e+02 1.435e+02 1.629e+02 1.890e+02 2.825e+02, threshold=3.258e+02, percent-clipped=0.0 2024-09-22 20:18:11,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-09-22 20:18:15,820 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 20:18:16,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.54 vs. limit=22.5 2024-09-22 20:18:34,805 INFO [train.py:1198] (3/4) Epoch 6, batch 750, loss[loss=0.3158, ctc_loss=0.2307, cr_loss=0.4255, over 17006.00 frames. ], tot_loss[loss=0.2911, ctc_loss=0.2103, cr_loss=0.4041, over 3273289.62 frames. ], batch size: 53, lr: 2.01e-02, grad_scale: 32.0 2024-09-22 20:18:53,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=94453.33333333333, ans=0.0 2024-09-22 20:18:58,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94453.33333333333, ans=0.1 2024-09-22 20:19:22,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2024-09-22 20:19:28,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=94546.66666666667, ans=0.125 2024-09-22 20:19:46,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=94593.33333333333, ans=0.125 2024-09-22 20:19:58,982 INFO [train.py:1198] (3/4) Epoch 6, batch 800, loss[loss=0.2585, ctc_loss=0.1839, cr_loss=0.3729, over 17259.00 frames. ], tot_loss[loss=0.2893, ctc_loss=0.2088, cr_loss=0.4025, over 3302647.31 frames. ], batch size: 42, lr: 2.01e-02, grad_scale: 32.0 2024-09-22 20:20:07,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=94640.0, ans=0.125 2024-09-22 20:20:24,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=94686.66666666667, ans=0.125 2024-09-22 20:20:53,106 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.473e+02 1.596e+02 1.875e+02 3.402e+02, threshold=3.192e+02, percent-clipped=2.0 2024-09-22 20:21:01,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=94826.66666666667, ans=0.0 2024-09-22 20:21:07,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=94826.66666666667, ans=0.1 2024-09-22 20:21:14,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94826.66666666667, ans=0.1 2024-09-22 20:21:18,627 INFO [train.py:1198] (3/4) Epoch 6, batch 850, loss[loss=0.3092, ctc_loss=0.2216, cr_loss=0.4377, over 17238.00 frames. ], tot_loss[loss=0.2898, ctc_loss=0.2092, cr_loss=0.4033, over 3313393.16 frames. ], batch size: 50, lr: 2.00e-02, grad_scale: 32.0 2024-09-22 20:21:22,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=94873.33333333333, ans=0.125 2024-09-22 20:21:30,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=94873.33333333333, ans=0.025 2024-09-22 20:21:41,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=94920.0, ans=0.0 2024-09-22 20:21:43,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=94920.0, ans=0.125 2024-09-22 20:21:45,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=15.0 2024-09-22 20:21:47,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=94920.0, ans=0.125 2024-09-22 20:21:52,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=94966.66666666667, ans=0.2 2024-09-22 20:21:52,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=94966.66666666667, ans=0.125 2024-09-22 20:21:54,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=94966.66666666667, ans=0.025 2024-09-22 20:22:30,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=95060.0, ans=0.0 2024-09-22 20:22:42,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=95106.66666666667, ans=0.0 2024-09-22 20:22:43,773 INFO [train.py:1198] (3/4) Epoch 6, batch 900, loss[loss=0.2865, ctc_loss=0.2046, cr_loss=0.4094, over 17077.00 frames. ], tot_loss[loss=0.2907, ctc_loss=0.2098, cr_loss=0.4042, over 3327959.99 frames. ], batch size: 46, lr: 2.00e-02, grad_scale: 32.0 2024-09-22 20:22:47,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=95106.66666666667, ans=0.1 2024-09-22 20:23:23,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95200.0, ans=0.1 2024-09-22 20:23:37,721 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.204e+02 1.462e+02 1.630e+02 1.916e+02 2.984e+02, threshold=3.259e+02, percent-clipped=0.0 2024-09-22 20:23:43,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=95246.66666666667, ans=0.1 2024-09-22 20:23:53,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=95293.33333333333, ans=0.125 2024-09-22 20:24:05,923 INFO [train.py:1198] (3/4) Epoch 6, batch 950, loss[loss=0.3263, ctc_loss=0.2381, cr_loss=0.4409, over 16982.00 frames. ], tot_loss[loss=0.2909, ctc_loss=0.21, cr_loss=0.4041, over 3323399.31 frames. ], batch size: 53, lr: 2.00e-02, grad_scale: 32.0 2024-09-22 20:24:11,138 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.50 vs. limit=15.0 2024-09-22 20:24:32,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=95386.66666666667, ans=0.025 2024-09-22 20:24:35,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=95386.66666666667, ans=0.125 2024-09-22 20:25:28,488 INFO [train.py:1198] (3/4) Epoch 6, batch 1000, loss[loss=0.296, ctc_loss=0.2117, cr_loss=0.4212, over 16428.00 frames. ], tot_loss[loss=0.2904, ctc_loss=0.2096, cr_loss=0.4038, over 3329529.80 frames. ], batch size: 66, lr: 2.00e-02, grad_scale: 32.0 2024-09-22 20:25:37,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2024-09-22 20:25:46,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=95620.0, ans=0.0 2024-09-22 20:26:22,800 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.223e+02 1.403e+02 1.529e+02 1.831e+02 2.517e+02, threshold=3.058e+02, percent-clipped=0.0 2024-09-22 20:26:24,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=95713.33333333333, ans=0.0 2024-09-22 20:26:36,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=22.5 2024-09-22 20:26:48,477 INFO [train.py:1198] (3/4) Epoch 6, batch 1050, loss[loss=0.2906, ctc_loss=0.2046, cr_loss=0.4299, over 17145.00 frames. ], tot_loss[loss=0.2896, ctc_loss=0.2089, cr_loss=0.4036, over 3346546.68 frames. ], batch size: 48, lr: 2.00e-02, grad_scale: 32.0 2024-09-22 20:26:52,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2024-09-22 20:27:15,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=95853.33333333333, ans=0.07 2024-09-22 20:27:55,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=95993.33333333333, ans=0.95 2024-09-22 20:28:03,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=95993.33333333333, ans=0.1 2024-09-22 20:28:13,100 INFO [train.py:1198] (3/4) Epoch 6, batch 1100, loss[loss=0.2356, ctc_loss=0.1654, cr_loss=0.3506, over 16941.00 frames. ], tot_loss[loss=0.2891, ctc_loss=0.2084, cr_loss=0.4036, over 3351619.88 frames. ], batch size: 42, lr: 1.99e-02, grad_scale: 32.0 2024-09-22 20:28:22,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=96040.0, ans=0.125 2024-09-22 20:28:46,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=96133.33333333333, ans=0.05 2024-09-22 20:28:49,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=96133.33333333333, ans=0.0 2024-09-22 20:28:54,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=96133.33333333333, ans=0.0 2024-09-22 20:28:57,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.83 vs. limit=22.5 2024-09-22 20:29:12,627 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.277e+02 1.448e+02 1.600e+02 1.818e+02 3.191e+02, threshold=3.201e+02, percent-clipped=3.0 2024-09-22 20:29:17,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=96180.0, ans=0.125 2024-09-22 20:29:21,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.11 vs. limit=10.0 2024-09-22 20:29:35,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=96226.66666666667, ans=0.2 2024-09-22 20:29:38,030 INFO [train.py:1198] (3/4) Epoch 6, batch 1150, loss[loss=0.2766, ctc_loss=0.2029, cr_loss=0.3684, over 17344.00 frames. ], tot_loss[loss=0.2882, ctc_loss=0.2078, cr_loss=0.4023, over 3360439.18 frames. ], batch size: 48, lr: 1.99e-02, grad_scale: 32.0 2024-09-22 20:29:47,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=96273.33333333333, ans=0.125 2024-09-22 20:30:00,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=96320.0, ans=0.125 2024-09-22 20:30:01,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2024-09-22 20:30:20,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=12.0 2024-09-22 20:30:34,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-09-22 20:30:48,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=96460.0, ans=0.1 2024-09-22 20:30:54,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=96460.0, ans=0.125 2024-09-22 20:30:57,275 INFO [train.py:1198] (3/4) Epoch 6, batch 1200, loss[loss=0.2758, ctc_loss=0.2002, cr_loss=0.3776, over 17177.00 frames. ], tot_loss[loss=0.2873, ctc_loss=0.2069, cr_loss=0.4019, over 3364148.69 frames. ], batch size: 41, lr: 1.99e-02, grad_scale: 32.0 2024-09-22 20:31:10,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=96506.66666666667, ans=0.0 2024-09-22 20:31:25,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.73 vs. limit=10.0 2024-09-22 20:31:25,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.32 vs. limit=22.5 2024-09-22 20:31:39,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=96600.0, ans=0.0 2024-09-22 20:31:44,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=96646.66666666667, ans=0.125 2024-09-22 20:31:46,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2024-09-22 20:31:48,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=96646.66666666667, ans=0.025 2024-09-22 20:31:51,568 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.195e+02 1.470e+02 1.636e+02 2.013e+02 4.309e+02, threshold=3.271e+02, percent-clipped=1.0 2024-09-22 20:32:17,042 INFO [train.py:1198] (3/4) Epoch 6, batch 1250, loss[loss=0.2797, ctc_loss=0.2002, cr_loss=0.3976, over 17306.00 frames. ], tot_loss[loss=0.2869, ctc_loss=0.2067, cr_loss=0.401, over 3357510.13 frames. ], batch size: 51, lr: 1.99e-02, grad_scale: 32.0 2024-09-22 20:32:20,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96740.0, ans=0.1 2024-09-22 20:32:28,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=96740.0, ans=0.0 2024-09-22 20:32:41,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=96786.66666666667, ans=0.025 2024-09-22 20:32:49,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=96786.66666666667, ans=0.125 2024-09-22 20:32:55,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=96833.33333333333, ans=0.2 2024-09-22 20:32:59,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=96833.33333333333, ans=0.125 2024-09-22 20:33:12,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.84 vs. limit=10.0 2024-09-22 20:33:43,914 INFO [train.py:1198] (3/4) Epoch 6, batch 1300, loss[loss=0.2714, ctc_loss=0.1949, cr_loss=0.3824, over 17271.00 frames. ], tot_loss[loss=0.2871, ctc_loss=0.2069, cr_loss=0.401, over 3352237.49 frames. ], batch size: 42, lr: 1.99e-02, grad_scale: 32.0 2024-09-22 20:33:51,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=96973.33333333333, ans=0.125 2024-09-22 20:34:16,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2024-09-22 20:34:26,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=97066.66666666667, ans=0.125 2024-09-22 20:34:28,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=97066.66666666667, ans=0.0 2024-09-22 20:34:33,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.38 vs. limit=22.5 2024-09-22 20:34:42,529 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.454e+02 1.645e+02 1.943e+02 2.545e+02, threshold=3.291e+02, percent-clipped=0.0 2024-09-22 20:35:02,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=97160.0, ans=0.125 2024-09-22 20:35:06,837 INFO [train.py:1198] (3/4) Epoch 6, batch 1350, loss[loss=0.2652, ctc_loss=0.1899, cr_loss=0.3764, over 17284.00 frames. ], tot_loss[loss=0.2862, ctc_loss=0.2062, cr_loss=0.3998, over 3349044.30 frames. ], batch size: 42, lr: 1.98e-02, grad_scale: 32.0 2024-09-22 20:35:12,608 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.23 vs. limit=22.5 2024-09-22 20:35:16,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=97206.66666666667, ans=0.125 2024-09-22 20:35:40,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-09-22 20:35:46,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=97300.0, ans=0.125 2024-09-22 20:35:52,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=15.0 2024-09-22 20:36:10,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=97393.33333333333, ans=0.0 2024-09-22 20:36:23,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=97393.33333333333, ans=0.125 2024-09-22 20:36:25,758 INFO [train.py:1198] (3/4) Epoch 6, batch 1400, loss[loss=0.2482, ctc_loss=0.1773, cr_loss=0.3547, over 17012.00 frames. ], tot_loss[loss=0.2861, ctc_loss=0.2062, cr_loss=0.3995, over 3347785.61 frames. ], batch size: 44, lr: 1.98e-02, grad_scale: 32.0 2024-09-22 20:36:43,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=97486.66666666667, ans=0.07 2024-09-22 20:36:50,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=97486.66666666667, ans=0.1 2024-09-22 20:36:53,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2024-09-22 20:37:10,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=15.0 2024-09-22 20:37:24,530 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.201e+02 1.419e+02 1.584e+02 1.972e+02 3.775e+02, threshold=3.168e+02, percent-clipped=1.0 2024-09-22 20:37:26,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-09-22 20:37:27,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=97580.0, ans=0.1 2024-09-22 20:37:39,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.21 vs. limit=15.0 2024-09-22 20:37:48,280 INFO [train.py:1198] (3/4) Epoch 6, batch 1450, loss[loss=0.2752, ctc_loss=0.192, cr_loss=0.4161, over 17301.00 frames. ], tot_loss[loss=0.2865, ctc_loss=0.2064, cr_loss=0.4004, over 3353473.73 frames. ], batch size: 51, lr: 1.98e-02, grad_scale: 32.0 2024-09-22 20:37:50,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.14 vs. limit=22.5 2024-09-22 20:38:01,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=97673.33333333333, ans=0.125 2024-09-22 20:38:11,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2024-09-22 20:38:26,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=97766.66666666667, ans=0.125 2024-09-22 20:39:08,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=97860.0, ans=0.0 2024-09-22 20:39:12,847 INFO [train.py:1198] (3/4) Epoch 6, batch 1500, loss[loss=0.3188, ctc_loss=0.2303, cr_loss=0.4428, over 16625.00 frames. ], tot_loss[loss=0.2868, ctc_loss=0.2066, cr_loss=0.401, over 3360285.79 frames. ], batch size: 66, lr: 1.98e-02, grad_scale: 32.0 2024-09-22 20:39:43,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=98000.0, ans=0.1 2024-09-22 20:40:09,174 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.237e+02 1.478e+02 1.585e+02 1.791e+02 2.453e+02, threshold=3.170e+02, percent-clipped=0.0 2024-09-22 20:40:12,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=98046.66666666667, ans=0.0 2024-09-22 20:40:33,021 INFO [train.py:1198] (3/4) Epoch 6, batch 1550, loss[loss=0.2735, ctc_loss=0.1942, cr_loss=0.3968, over 17122.00 frames. ], tot_loss[loss=0.286, ctc_loss=0.206, cr_loss=0.3999, over 3357142.89 frames. ], batch size: 43, lr: 1.98e-02, grad_scale: 32.0 2024-09-22 20:40:42,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=98140.0, ans=0.125 2024-09-22 20:40:49,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=98186.66666666667, ans=0.125 2024-09-22 20:41:00,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=15.0 2024-09-22 20:41:09,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=98233.33333333333, ans=0.125 2024-09-22 20:41:22,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=98280.0, ans=0.125 2024-09-22 20:41:27,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=98280.0, ans=0.025 2024-09-22 20:41:51,965 INFO [train.py:1198] (3/4) Epoch 6, batch 1600, loss[loss=0.2768, ctc_loss=0.203, cr_loss=0.3689, over 17091.00 frames. ], tot_loss[loss=0.2875, ctc_loss=0.2072, cr_loss=0.4012, over 3350790.29 frames. ], batch size: 49, lr: 1.97e-02, grad_scale: 32.0 2024-09-22 20:42:25,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-09-22 20:42:40,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=98466.66666666667, ans=0.2 2024-09-22 20:42:40,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=98466.66666666667, ans=0.0 2024-09-22 20:42:41,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=98466.66666666667, ans=0.0 2024-09-22 20:42:45,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=98513.33333333333, ans=0.2 2024-09-22 20:42:52,418 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.223e+02 1.461e+02 1.657e+02 2.022e+02 3.350e+02, threshold=3.314e+02, percent-clipped=2.0 2024-09-22 20:43:03,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=98560.0, ans=0.125 2024-09-22 20:43:04,031 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 20:43:16,608 INFO [train.py:1198] (3/4) Epoch 6, batch 1650, loss[loss=0.2798, ctc_loss=0.2016, cr_loss=0.3908, over 17243.00 frames. ], tot_loss[loss=0.2874, ctc_loss=0.2071, cr_loss=0.4016, over 3355028.91 frames. ], batch size: 55, lr: 1.97e-02, grad_scale: 32.0 2024-09-22 20:43:44,971 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.42 vs. limit=10.0 2024-09-22 20:43:50,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=98700.0, ans=0.125 2024-09-22 20:44:02,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=98700.0, ans=0.125 2024-09-22 20:44:10,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=98746.66666666667, ans=0.1 2024-09-22 20:44:18,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=98746.66666666667, ans=0.125 2024-09-22 20:44:37,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=98793.33333333333, ans=0.025 2024-09-22 20:44:40,255 INFO [train.py:1198] (3/4) Epoch 6, batch 1700, loss[loss=0.325, ctc_loss=0.2405, cr_loss=0.4223, over 17025.00 frames. ], tot_loss[loss=0.288, ctc_loss=0.2076, cr_loss=0.4018, over 3348957.89 frames. ], batch size: 51, lr: 1.97e-02, grad_scale: 32.0 2024-09-22 20:45:04,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=98886.66666666667, ans=0.0 2024-09-22 20:45:10,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-09-22 20:45:36,068 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.207e+02 1.362e+02 1.512e+02 1.797e+02 3.142e+02, threshold=3.023e+02, percent-clipped=0.0 2024-09-22 20:45:39,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=98980.0, ans=0.2 2024-09-22 20:46:00,098 INFO [train.py:1198] (3/4) Epoch 6, batch 1750, loss[loss=0.3002, ctc_loss=0.2191, cr_loss=0.4054, over 17298.00 frames. ], tot_loss[loss=0.2857, ctc_loss=0.2057, cr_loss=0.3996, over 3359226.18 frames. ], batch size: 49, lr: 1.97e-02, grad_scale: 32.0 2024-09-22 20:46:06,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=99073.33333333333, ans=0.2 2024-09-22 20:46:13,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=99073.33333333333, ans=0.0 2024-09-22 20:46:14,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=99120.0, ans=0.2 2024-09-22 20:46:32,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=99166.66666666667, ans=0.125 2024-09-22 20:46:34,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2024-09-22 20:46:38,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99166.66666666667, ans=0.1 2024-09-22 20:46:51,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=99213.33333333333, ans=0.0 2024-09-22 20:47:24,700 INFO [train.py:1198] (3/4) Epoch 6, batch 1800, loss[loss=0.2868, ctc_loss=0.2029, cr_loss=0.4193, over 17107.00 frames. ], tot_loss[loss=0.2852, ctc_loss=0.2053, cr_loss=0.3994, over 3357125.36 frames. ], batch size: 49, lr: 1.96e-02, grad_scale: 32.0 2024-09-22 20:47:25,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=99306.66666666667, ans=0.0 2024-09-22 20:48:01,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=99400.0, ans=0.04949747468305833 2024-09-22 20:48:12,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=99446.66666666667, ans=0.1 2024-09-22 20:48:14,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=99446.66666666667, ans=0.2 2024-09-22 20:48:22,609 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.233e+02 1.401e+02 1.510e+02 1.768e+02 2.829e+02, threshold=3.019e+02, percent-clipped=0.0 2024-09-22 20:48:22,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=99446.66666666667, ans=0.025 2024-09-22 20:48:22,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=99446.66666666667, ans=0.125 2024-09-22 20:48:35,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=99493.33333333333, ans=0.0 2024-09-22 20:48:38,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=99493.33333333333, ans=0.125 2024-09-22 20:48:39,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.30 vs. limit=15.0 2024-09-22 20:48:46,315 INFO [train.py:1198] (3/4) Epoch 6, batch 1850, loss[loss=0.2917, ctc_loss=0.2096, cr_loss=0.4105, over 16946.00 frames. ], tot_loss[loss=0.2853, ctc_loss=0.2055, cr_loss=0.399, over 3350466.42 frames. ], batch size: 58, lr: 1.96e-02, grad_scale: 32.0 2024-09-22 20:49:00,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=99540.0, ans=0.125 2024-09-22 20:49:14,414 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.83 vs. limit=15.0 2024-09-22 20:49:15,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=99586.66666666667, ans=0.125 2024-09-22 20:49:15,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=99586.66666666667, ans=0.025 2024-09-22 20:49:24,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=99633.33333333333, ans=0.05 2024-09-22 20:49:37,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=99680.0, ans=0.0 2024-09-22 20:49:39,299 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 20:49:53,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=99726.66666666667, ans=0.125 2024-09-22 20:49:55,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-09-22 20:50:02,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.41 vs. limit=15.0 2024-09-22 20:50:09,096 INFO [train.py:1198] (3/4) Epoch 6, batch 1900, loss[loss=0.2335, ctc_loss=0.1644, cr_loss=0.3458, over 16249.00 frames. ], tot_loss[loss=0.2851, ctc_loss=0.2053, cr_loss=0.3987, over 3338220.56 frames. ], batch size: 36, lr: 1.96e-02, grad_scale: 32.0 2024-09-22 20:51:05,355 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.450e+02 1.646e+02 2.008e+02 3.199e+02, threshold=3.291e+02, percent-clipped=1.0 2024-09-22 20:51:10,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=99913.33333333333, ans=0.125 2024-09-22 20:51:12,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99960.0, ans=0.1 2024-09-22 20:51:29,423 INFO [train.py:1198] (3/4) Epoch 6, batch 1950, loss[loss=0.3056, ctc_loss=0.2269, cr_loss=0.3937, over 16789.00 frames. ], tot_loss[loss=0.2855, ctc_loss=0.2056, cr_loss=0.3991, over 3334856.71 frames. ], batch size: 61, lr: 1.96e-02, grad_scale: 32.0 2024-09-22 20:51:29,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=100006.66666666667, ans=0.125 2024-09-22 20:51:32,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=100006.66666666667, ans=0.125 2024-09-22 20:51:45,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=12.0 2024-09-22 20:51:56,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=100053.33333333333, ans=0.0 2024-09-22 20:52:26,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=100146.66666666667, ans=0.125 2024-09-22 20:52:28,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=22.5 2024-09-22 20:52:49,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=100193.33333333333, ans=0.0 2024-09-22 20:52:53,475 INFO [train.py:1198] (3/4) Epoch 6, batch 2000, loss[loss=0.2496, ctc_loss=0.1741, cr_loss=0.377, over 17139.00 frames. ], tot_loss[loss=0.2858, ctc_loss=0.2058, cr_loss=0.4002, over 3350487.37 frames. ], batch size: 45, lr: 1.96e-02, grad_scale: 32.0 2024-09-22 20:53:17,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=100286.66666666667, ans=0.125 2024-09-22 20:53:45,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=100380.0, ans=0.125 2024-09-22 20:53:45,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=100380.0, ans=0.1 2024-09-22 20:53:54,725 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.289e+02 1.463e+02 1.652e+02 2.000e+02 4.397e+02, threshold=3.304e+02, percent-clipped=3.0 2024-09-22 20:54:01,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=100426.66666666667, ans=0.0 2024-09-22 20:54:18,665 INFO [train.py:1198] (3/4) Epoch 6, batch 2050, loss[loss=0.2769, ctc_loss=0.1956, cr_loss=0.4068, over 17316.00 frames. ], tot_loss[loss=0.286, ctc_loss=0.2061, cr_loss=0.3996, over 3340630.63 frames. ], batch size: 51, lr: 1.95e-02, grad_scale: 32.0 2024-09-22 20:54:24,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=15.0 2024-09-22 20:54:44,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=100520.0, ans=0.125 2024-09-22 20:55:01,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=100566.66666666667, ans=0.125 2024-09-22 20:55:13,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=100613.33333333333, ans=0.125 2024-09-22 20:55:13,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=100613.33333333333, ans=0.125 2024-09-22 20:55:37,973 INFO [train.py:1198] (3/4) Epoch 6, batch 2100, loss[loss=0.2976, ctc_loss=0.2118, cr_loss=0.4291, over 16732.00 frames. ], tot_loss[loss=0.2859, ctc_loss=0.2059, cr_loss=0.3998, over 3350204.46 frames. ], batch size: 61, lr: 1.95e-02, grad_scale: 32.0 2024-09-22 20:55:54,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=100753.33333333333, ans=0.1 2024-09-22 20:56:15,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=100800.0, ans=0.125 2024-09-22 20:56:33,955 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.234e+02 1.439e+02 1.570e+02 1.892e+02 4.315e+02, threshold=3.139e+02, percent-clipped=1.0 2024-09-22 20:56:36,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-09-22 20:56:40,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=100893.33333333333, ans=0.025 2024-09-22 20:56:58,029 INFO [train.py:1198] (3/4) Epoch 6, batch 2150, loss[loss=0.2441, ctc_loss=0.1758, cr_loss=0.3415, over 16640.00 frames. ], tot_loss[loss=0.2864, ctc_loss=0.2064, cr_loss=0.4001, over 3352736.56 frames. ], batch size: 37, lr: 1.95e-02, grad_scale: 32.0 2024-09-22 20:57:41,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=101033.33333333333, ans=0.125 2024-09-22 20:58:02,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=101080.0, ans=0.1 2024-09-22 20:58:08,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.34 vs. limit=15.0 2024-09-22 20:58:16,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=101126.66666666667, ans=0.025 2024-09-22 20:58:22,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=101126.66666666667, ans=0.0 2024-09-22 20:58:25,371 INFO [train.py:1198] (3/4) Epoch 6, batch 2200, loss[loss=0.242, ctc_loss=0.1717, cr_loss=0.3517, over 17075.00 frames. ], tot_loss[loss=0.2872, ctc_loss=0.207, cr_loss=0.401, over 3340808.75 frames. ], batch size: 46, lr: 1.95e-02, grad_scale: 32.0 2024-09-22 20:58:37,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=101173.33333333333, ans=0.125 2024-09-22 20:58:39,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-09-22 20:58:43,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=101220.0, ans=0.125 2024-09-22 20:58:56,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=101220.0, ans=0.2 2024-09-22 20:59:07,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=101266.66666666667, ans=0.125 2024-09-22 20:59:16,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=101313.33333333333, ans=0.125 2024-09-22 20:59:22,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.97 vs. limit=6.0 2024-09-22 20:59:22,925 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.199e+02 1.501e+02 1.682e+02 2.031e+02 3.137e+02, threshold=3.364e+02, percent-clipped=0.0 2024-09-22 20:59:46,927 INFO [train.py:1198] (3/4) Epoch 6, batch 2250, loss[loss=0.2088, ctc_loss=0.143, cr_loss=0.3292, over 17193.00 frames. ], tot_loss[loss=0.2866, ctc_loss=0.2064, cr_loss=0.4009, over 3349328.59 frames. ], batch size: 41, lr: 1.95e-02, grad_scale: 32.0 2024-09-22 20:59:52,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=101406.66666666667, ans=0.125 2024-09-22 21:00:06,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=101453.33333333333, ans=0.125 2024-09-22 21:00:14,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=101453.33333333333, ans=10.0 2024-09-22 21:00:25,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=101500.0, ans=0.025 2024-09-22 21:00:38,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=101546.66666666667, ans=0.0 2024-09-22 21:00:41,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.22 vs. limit=6.0 2024-09-22 21:00:43,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-09-22 21:01:06,135 INFO [train.py:1198] (3/4) Epoch 6, batch 2300, loss[loss=0.275, ctc_loss=0.1983, cr_loss=0.3837, over 16965.00 frames. ], tot_loss[loss=0.2867, ctc_loss=0.2065, cr_loss=0.401, over 3352039.01 frames. ], batch size: 42, lr: 1.94e-02, grad_scale: 32.0 2024-09-22 21:01:42,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.92 vs. limit=22.5 2024-09-22 21:02:06,859 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.444e+02 1.711e+02 1.913e+02 2.754e+02, threshold=3.422e+02, percent-clipped=0.0 2024-09-22 21:02:24,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=101826.66666666667, ans=0.2 2024-09-22 21:02:29,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=101873.33333333333, ans=0.2 2024-09-22 21:02:30,424 INFO [train.py:1198] (3/4) Epoch 6, batch 2350, loss[loss=0.2473, ctc_loss=0.1738, cr_loss=0.3678, over 17302.00 frames. ], tot_loss[loss=0.286, ctc_loss=0.2059, cr_loss=0.4004, over 3356242.28 frames. ], batch size: 46, lr: 1.94e-02, grad_scale: 32.0 2024-09-22 21:02:35,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=101873.33333333333, ans=0.1 2024-09-22 21:02:43,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=101873.33333333333, ans=0.2 2024-09-22 21:02:48,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-09-22 21:03:03,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=101966.66666666667, ans=0.0 2024-09-22 21:03:28,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.74 vs. limit=22.5 2024-09-22 21:03:29,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=102013.33333333333, ans=0.125 2024-09-22 21:03:55,325 INFO [train.py:1198] (3/4) Epoch 6, batch 2400, loss[loss=0.2483, ctc_loss=0.1803, cr_loss=0.3402, over 17089.00 frames. ], tot_loss[loss=0.2854, ctc_loss=0.2055, cr_loss=0.3996, over 3362687.77 frames. ], batch size: 40, lr: 1.94e-02, grad_scale: 32.0 2024-09-22 21:04:01,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=102106.66666666667, ans=0.125 2024-09-22 21:04:09,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=102153.33333333333, ans=0.0 2024-09-22 21:04:47,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=102246.66666666667, ans=0.025 2024-09-22 21:04:50,185 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.229e+02 1.414e+02 1.609e+02 1.881e+02 2.919e+02, threshold=3.217e+02, percent-clipped=0.0 2024-09-22 21:05:03,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=102293.33333333333, ans=0.95 2024-09-22 21:05:14,227 INFO [train.py:1198] (3/4) Epoch 6, batch 2450, loss[loss=0.2586, ctc_loss=0.1837, cr_loss=0.3744, over 17067.00 frames. ], tot_loss[loss=0.2839, ctc_loss=0.2041, cr_loss=0.3987, over 3365295.82 frames. ], batch size: 46, lr: 1.94e-02, grad_scale: 32.0 2024-09-22 21:05:54,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=102433.33333333333, ans=0.0 2024-09-22 21:06:05,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=102480.0, ans=0.1 2024-09-22 21:06:15,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=102480.0, ans=0.0 2024-09-22 21:06:33,762 INFO [train.py:1198] (3/4) Epoch 6, batch 2500, loss[loss=0.3076, ctc_loss=0.2224, cr_loss=0.426, over 17156.00 frames. ], tot_loss[loss=0.285, ctc_loss=0.2052, cr_loss=0.3992, over 3359148.14 frames. ], batch size: 48, lr: 1.94e-02, grad_scale: 32.0 2024-09-22 21:07:19,256 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:07:34,921 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.228e+02 1.454e+02 1.581e+02 1.860e+02 2.771e+02, threshold=3.162e+02, percent-clipped=0.0 2024-09-22 21:07:43,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=102760.0, ans=0.0 2024-09-22 21:07:53,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=102760.0, ans=0.025 2024-09-22 21:08:01,447 INFO [train.py:1198] (3/4) Epoch 6, batch 2550, loss[loss=0.2825, ctc_loss=0.2065, cr_loss=0.3801, over 17312.00 frames. ], tot_loss[loss=0.2856, ctc_loss=0.2054, cr_loss=0.4006, over 3364483.69 frames. ], batch size: 51, lr: 1.93e-02, grad_scale: 32.0 2024-09-22 21:08:43,355 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=12.0 2024-09-22 21:08:43,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=102900.0, ans=12.0 2024-09-22 21:09:01,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=102946.66666666667, ans=0.0 2024-09-22 21:09:14,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=102993.33333333333, ans=0.125 2024-09-22 21:09:23,606 INFO [train.py:1198] (3/4) Epoch 6, batch 2600, loss[loss=0.2564, ctc_loss=0.1806, cr_loss=0.379, over 17317.00 frames. ], tot_loss[loss=0.2857, ctc_loss=0.2055, cr_loss=0.401, over 3359937.09 frames. ], batch size: 51, lr: 1.93e-02, grad_scale: 32.0 2024-09-22 21:09:26,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2024-09-22 21:09:41,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=103086.66666666667, ans=0.0 2024-09-22 21:09:44,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=103086.66666666667, ans=0.1 2024-09-22 21:09:46,026 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:09:48,045 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.50 vs. limit=10.0 2024-09-22 21:10:01,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=103133.33333333333, ans=0.125 2024-09-22 21:10:15,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=103180.0, ans=0.125 2024-09-22 21:10:18,671 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.459e+02 1.706e+02 2.072e+02 3.287e+02, threshold=3.412e+02, percent-clipped=1.0 2024-09-22 21:10:26,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=103226.66666666667, ans=0.2 2024-09-22 21:10:26,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=103226.66666666667, ans=0.1 2024-09-22 21:10:39,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.29 vs. limit=10.0 2024-09-22 21:10:41,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=103273.33333333333, ans=0.125 2024-09-22 21:10:42,340 INFO [train.py:1198] (3/4) Epoch 6, batch 2650, loss[loss=0.3087, ctc_loss=0.2237, cr_loss=0.4251, over 17299.00 frames. ], tot_loss[loss=0.2865, ctc_loss=0.2063, cr_loss=0.4013, over 3352292.72 frames. ], batch size: 49, lr: 1.93e-02, grad_scale: 32.0 2024-09-22 21:10:45,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=103273.33333333333, ans=0.125 2024-09-22 21:11:05,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.39 vs. limit=15.0 2024-09-22 21:11:13,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=103366.66666666667, ans=0.1 2024-09-22 21:11:27,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=103366.66666666667, ans=0.125 2024-09-22 21:11:45,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=103413.33333333333, ans=0.2 2024-09-22 21:11:47,484 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:12:06,889 INFO [train.py:1198] (3/4) Epoch 6, batch 2700, loss[loss=0.2482, ctc_loss=0.1764, cr_loss=0.3593, over 17041.00 frames. ], tot_loss[loss=0.2862, ctc_loss=0.206, cr_loss=0.4009, over 3351818.29 frames. ], batch size: 39, lr: 1.93e-02, grad_scale: 32.0 2024-09-22 21:12:11,062 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2024-09-22 21:12:54,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=103600.0, ans=0.0 2024-09-22 21:12:59,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=103646.66666666667, ans=0.125 2024-09-22 21:13:05,402 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.175e+02 1.442e+02 1.580e+02 1.786e+02 2.700e+02, threshold=3.159e+02, percent-clipped=0.0 2024-09-22 21:13:31,581 INFO [train.py:1198] (3/4) Epoch 6, batch 2750, loss[loss=0.2956, ctc_loss=0.2102, cr_loss=0.4272, over 17044.00 frames. ], tot_loss[loss=0.2855, ctc_loss=0.2055, cr_loss=0.4001, over 3358093.08 frames. ], batch size: 51, lr: 1.93e-02, grad_scale: 32.0 2024-09-22 21:13:46,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=103786.66666666667, ans=0.125 2024-09-22 21:14:12,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-09-22 21:14:22,906 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=15.0 2024-09-22 21:14:51,046 INFO [train.py:1198] (3/4) Epoch 6, batch 2800, loss[loss=0.296, ctc_loss=0.2106, cr_loss=0.4269, over 17223.00 frames. ], tot_loss[loss=0.2852, ctc_loss=0.2051, cr_loss=0.4005, over 3357363.22 frames. ], batch size: 50, lr: 1.92e-02, grad_scale: 32.0 2024-09-22 21:15:32,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=104066.66666666667, ans=0.2 2024-09-22 21:15:32,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=104066.66666666667, ans=0.1 2024-09-22 21:15:46,539 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.391e+02 1.557e+02 1.769e+02 3.866e+02, threshold=3.114e+02, percent-clipped=1.0 2024-09-22 21:16:10,247 INFO [train.py:1198] (3/4) Epoch 6, batch 2850, loss[loss=0.2441, ctc_loss=0.1693, cr_loss=0.3737, over 17027.00 frames. ], tot_loss[loss=0.2868, ctc_loss=0.2065, cr_loss=0.4015, over 3357650.41 frames. ], batch size: 44, lr: 1.92e-02, grad_scale: 32.0 2024-09-22 21:16:21,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=104206.66666666667, ans=0.125 2024-09-22 21:16:46,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=104300.0, ans=0.125 2024-09-22 21:17:13,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=104346.66666666667, ans=0.125 2024-09-22 21:17:17,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=104393.33333333333, ans=0.125 2024-09-22 21:17:21,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2024-09-22 21:17:35,031 INFO [train.py:1198] (3/4) Epoch 6, batch 2900, loss[loss=0.2941, ctc_loss=0.2101, cr_loss=0.4197, over 16807.00 frames. ], tot_loss[loss=0.2843, ctc_loss=0.2045, cr_loss=0.399, over 3365461.36 frames. ], batch size: 61, lr: 1.92e-02, grad_scale: 32.0 2024-09-22 21:17:37,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2024-09-22 21:18:25,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=104533.33333333333, ans=0.1 2024-09-22 21:18:33,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2024-09-22 21:18:35,989 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.209e+02 1.508e+02 1.666e+02 1.915e+02 2.988e+02, threshold=3.332e+02, percent-clipped=0.0 2024-09-22 21:18:54,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=104626.66666666667, ans=0.125 2024-09-22 21:18:59,989 INFO [train.py:1198] (3/4) Epoch 6, batch 2950, loss[loss=0.3161, ctc_loss=0.229, cr_loss=0.4352, over 15982.00 frames. ], tot_loss[loss=0.2841, ctc_loss=0.2043, cr_loss=0.3988, over 3361839.55 frames. ], batch size: 74, lr: 1.92e-02, grad_scale: 32.0 2024-09-22 21:19:19,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=104720.0, ans=0.07 2024-09-22 21:19:30,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=104766.66666666667, ans=0.0 2024-09-22 21:19:54,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=104813.33333333333, ans=0.2 2024-09-22 21:19:59,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=104813.33333333333, ans=0.125 2024-09-22 21:20:14,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=104860.0, ans=0.0 2024-09-22 21:20:19,066 INFO [train.py:1198] (3/4) Epoch 6, batch 3000, loss[loss=0.2503, ctc_loss=0.1781, cr_loss=0.3605, over 17036.00 frames. ], tot_loss[loss=0.2841, ctc_loss=0.2042, cr_loss=0.3993, over 3365242.52 frames. ], batch size: 44, lr: 1.92e-02, grad_scale: 32.0 2024-09-22 21:20:19,066 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 21:20:34,498 INFO [train.py:1230] (3/4) Epoch 6, validation: loss=0.06097, ctc_loss=0.06097, cr_loss=6.736e-15, over 944034.00 frames. 2024-09-22 21:20:34,499 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-22 21:20:39,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=104906.66666666667, ans=0.025 2024-09-22 21:20:47,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=104906.66666666667, ans=0.0 2024-09-22 21:20:48,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=104953.33333333333, ans=0.125 2024-09-22 21:20:50,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=104953.33333333333, ans=0.0 2024-09-22 21:21:02,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=104953.33333333333, ans=0.0 2024-09-22 21:21:18,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105000.0, ans=0.1 2024-09-22 21:21:29,321 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.397e+02 1.557e+02 1.721e+02 3.093e+02, threshold=3.113e+02, percent-clipped=0.0 2024-09-22 21:21:45,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=105093.33333333333, ans=10.0 2024-09-22 21:21:51,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105140.0, ans=0.1 2024-09-22 21:21:52,875 INFO [train.py:1198] (3/4) Epoch 6, batch 3050, loss[loss=0.2242, ctc_loss=0.1597, cr_loss=0.3226, over 17096.00 frames. ], tot_loss[loss=0.283, ctc_loss=0.2033, cr_loss=0.3982, over 3363714.35 frames. ], batch size: 40, lr: 1.92e-02, grad_scale: 32.0 2024-09-22 21:21:57,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=105140.0, ans=0.2 2024-09-22 21:22:32,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2024-09-22 21:23:12,836 INFO [train.py:1198] (3/4) Epoch 6, batch 3100, loss[loss=0.3142, ctc_loss=0.2255, cr_loss=0.4434, over 16932.00 frames. ], tot_loss[loss=0.2822, ctc_loss=0.2027, cr_loss=0.3979, over 3372806.48 frames. ], batch size: 58, lr: 1.91e-02, grad_scale: 32.0 2024-09-22 21:23:13,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=105373.33333333333, ans=0.025 2024-09-22 21:23:37,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=105420.0, ans=0.125 2024-09-22 21:23:59,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=22.5 2024-09-22 21:24:09,903 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.436e+02 1.623e+02 1.870e+02 2.887e+02, threshold=3.246e+02, percent-clipped=0.0 2024-09-22 21:24:10,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=105513.33333333333, ans=0.05 2024-09-22 21:24:28,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=105560.0, ans=0.125 2024-09-22 21:24:33,413 INFO [train.py:1198] (3/4) Epoch 6, batch 3150, loss[loss=0.258, ctc_loss=0.1787, cr_loss=0.3969, over 17041.00 frames. ], tot_loss[loss=0.2832, ctc_loss=0.2034, cr_loss=0.399, over 3357497.35 frames. ], batch size: 39, lr: 1.91e-02, grad_scale: 32.0 2024-09-22 21:24:38,706 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-09-22 21:24:49,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=105653.33333333333, ans=0.125 2024-09-22 21:25:20,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=105746.66666666667, ans=0.0 2024-09-22 21:25:24,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=105746.66666666667, ans=0.125 2024-09-22 21:25:34,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105793.33333333333, ans=0.1 2024-09-22 21:25:47,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=105793.33333333333, ans=0.0 2024-09-22 21:25:53,520 INFO [train.py:1198] (3/4) Epoch 6, batch 3200, loss[loss=0.2543, ctc_loss=0.1815, cr_loss=0.3639, over 17088.00 frames. ], tot_loss[loss=0.2839, ctc_loss=0.2038, cr_loss=0.4003, over 3359662.19 frames. ], batch size: 43, lr: 1.91e-02, grad_scale: 32.0 2024-09-22 21:26:07,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.76 vs. limit=15.0 2024-09-22 21:26:17,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=105886.66666666667, ans=0.125 2024-09-22 21:26:18,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=105886.66666666667, ans=0.125 2024-09-22 21:26:50,296 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.272e+02 1.472e+02 1.715e+02 1.976e+02 3.064e+02, threshold=3.429e+02, percent-clipped=0.0 2024-09-22 21:26:53,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=105980.0, ans=0.0 2024-09-22 21:26:58,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=106026.66666666667, ans=0.0 2024-09-22 21:27:04,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=106026.66666666667, ans=0.1 2024-09-22 21:27:13,662 INFO [train.py:1198] (3/4) Epoch 6, batch 3250, loss[loss=0.2852, ctc_loss=0.2024, cr_loss=0.4139, over 17064.00 frames. ], tot_loss[loss=0.2848, ctc_loss=0.2045, cr_loss=0.4016, over 3366018.90 frames. ], batch size: 46, lr: 1.91e-02, grad_scale: 32.0 2024-09-22 21:27:14,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=106073.33333333333, ans=0.0 2024-09-22 21:27:41,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=106120.0, ans=0.04949747468305833 2024-09-22 21:28:24,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=106260.0, ans=0.0 2024-09-22 21:28:31,722 INFO [train.py:1198] (3/4) Epoch 6, batch 3300, loss[loss=0.253, ctc_loss=0.179, cr_loss=0.3703, over 17009.00 frames. ], tot_loss[loss=0.2836, ctc_loss=0.2036, cr_loss=0.4003, over 3354964.91 frames. ], batch size: 39, lr: 1.91e-02, grad_scale: 64.0 2024-09-22 21:28:50,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=106353.33333333333, ans=0.0 2024-09-22 21:28:58,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=106353.33333333333, ans=0.05 2024-09-22 21:29:26,378 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.242e+02 1.523e+02 1.757e+02 2.023e+02 3.259e+02, threshold=3.514e+02, percent-clipped=0.0 2024-09-22 21:29:43,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=106493.33333333333, ans=0.125 2024-09-22 21:29:49,749 INFO [train.py:1198] (3/4) Epoch 6, batch 3350, loss[loss=0.3454, ctc_loss=0.2547, cr_loss=0.4537, over 15115.00 frames. ], tot_loss[loss=0.2826, ctc_loss=0.2028, cr_loss=0.3992, over 3350044.39 frames. ], batch size: 89, lr: 1.90e-02, grad_scale: 32.0 2024-09-22 21:29:51,792 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:30:04,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106586.66666666667, ans=0.1 2024-09-22 21:30:08,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=106586.66666666667, ans=0.2 2024-09-22 21:30:19,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=106633.33333333333, ans=0.2 2024-09-22 21:30:23,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=106633.33333333333, ans=0.2 2024-09-22 21:30:53,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=106726.66666666667, ans=0.125 2024-09-22 21:31:08,300 INFO [train.py:1198] (3/4) Epoch 6, batch 3400, loss[loss=0.2884, ctc_loss=0.2084, cr_loss=0.4, over 16795.00 frames. ], tot_loss[loss=0.2823, ctc_loss=0.2026, cr_loss=0.3984, over 3347320.46 frames. ], batch size: 61, lr: 1.90e-02, grad_scale: 32.0 2024-09-22 21:31:10,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=106773.33333333333, ans=0.125 2024-09-22 21:31:33,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=106820.0, ans=0.125 2024-09-22 21:32:04,338 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.185e+02 1.439e+02 1.572e+02 1.824e+02 3.611e+02, threshold=3.144e+02, percent-clipped=1.0 2024-09-22 21:32:26,358 INFO [train.py:1198] (3/4) Epoch 6, batch 3450, loss[loss=0.3503, ctc_loss=0.2589, cr_loss=0.4574, over 15179.00 frames. ], tot_loss[loss=0.2839, ctc_loss=0.2039, cr_loss=0.4, over 3340671.64 frames. ], batch size: 89, lr: 1.90e-02, grad_scale: 32.0 2024-09-22 21:32:33,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2024-09-22 21:33:13,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=107146.66666666667, ans=0.0 2024-09-22 21:33:15,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=107146.66666666667, ans=0.05 2024-09-22 21:33:20,067 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:33:21,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=107146.66666666667, ans=0.025 2024-09-22 21:33:35,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=107193.33333333333, ans=10.0 2024-09-22 21:33:46,064 INFO [train.py:1198] (3/4) Epoch 6, batch 3500, loss[loss=0.275, ctc_loss=0.1972, cr_loss=0.389, over 17108.00 frames. ], tot_loss[loss=0.2824, ctc_loss=0.2025, cr_loss=0.3991, over 3357545.81 frames. ], batch size: 49, lr: 1.90e-02, grad_scale: 32.0 2024-09-22 21:34:14,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=107286.66666666667, ans=0.125 2024-09-22 21:34:44,457 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.243e+02 1.542e+02 1.676e+02 1.907e+02 3.181e+02, threshold=3.352e+02, percent-clipped=1.0 2024-09-22 21:34:49,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=107426.66666666667, ans=0.125 2024-09-22 21:35:04,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=107473.33333333333, ans=0.125 2024-09-22 21:35:06,143 INFO [train.py:1198] (3/4) Epoch 6, batch 3550, loss[loss=0.2478, ctc_loss=0.1762, cr_loss=0.3581, over 16270.00 frames. ], tot_loss[loss=0.2817, ctc_loss=0.202, cr_loss=0.3985, over 3359332.17 frames. ], batch size: 36, lr: 1.90e-02, grad_scale: 32.0 2024-09-22 21:35:12,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=107473.33333333333, ans=0.04949747468305833 2024-09-22 21:35:17,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=107473.33333333333, ans=0.125 2024-09-22 21:35:35,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=107520.0, ans=0.125 2024-09-22 21:36:06,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=107613.33333333333, ans=0.2 2024-09-22 21:36:13,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=107660.0, ans=0.1 2024-09-22 21:36:28,363 INFO [train.py:1198] (3/4) Epoch 6, batch 3600, loss[loss=0.335, ctc_loss=0.2546, cr_loss=0.402, over 11639.00 frames. ], tot_loss[loss=0.2817, ctc_loss=0.2021, cr_loss=0.398, over 3355396.08 frames. ], batch size: 124, lr: 1.89e-02, grad_scale: 32.0 2024-09-22 21:36:47,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.14 vs. limit=10.0 2024-09-22 21:37:13,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=107846.66666666667, ans=0.0 2024-09-22 21:37:24,563 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.276e+02 1.473e+02 1.668e+02 1.932e+02 3.410e+02, threshold=3.336e+02, percent-clipped=1.0 2024-09-22 21:37:46,255 INFO [train.py:1198] (3/4) Epoch 6, batch 3650, loss[loss=0.2849, ctc_loss=0.2036, cr_loss=0.4064, over 17028.00 frames. ], tot_loss[loss=0.282, ctc_loss=0.2022, cr_loss=0.3989, over 3353508.34 frames. ], batch size: 44, lr: 1.89e-02, grad_scale: 32.0 2024-09-22 21:37:51,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=107940.0, ans=0.09899494936611666 2024-09-22 21:38:02,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=15.0 2024-09-22 21:38:08,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=107986.66666666667, ans=0.125 2024-09-22 21:38:28,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2024-09-22 21:39:04,805 INFO [train.py:1198] (3/4) Epoch 6, batch 3700, loss[loss=0.302, ctc_loss=0.2172, cr_loss=0.4238, over 17300.00 frames. ], tot_loss[loss=0.2823, ctc_loss=0.2026, cr_loss=0.3987, over 3355047.24 frames. ], batch size: 51, lr: 1.89e-02, grad_scale: 32.0 2024-09-22 21:39:27,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=108220.0, ans=0.025 2024-09-22 21:39:43,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=108266.66666666667, ans=0.025 2024-09-22 21:40:01,663 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.455e+02 1.690e+02 2.004e+02 3.040e+02, threshold=3.380e+02, percent-clipped=0.0 2024-09-22 21:40:01,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=108313.33333333333, ans=0.0 2024-09-22 21:40:17,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=108360.0, ans=0.125 2024-09-22 21:40:23,106 INFO [train.py:1198] (3/4) Epoch 6, batch 3750, loss[loss=0.2683, ctc_loss=0.1926, cr_loss=0.3784, over 16942.00 frames. ], tot_loss[loss=0.2835, ctc_loss=0.2035, cr_loss=0.3997, over 3357059.77 frames. ], batch size: 42, lr: 1.89e-02, grad_scale: 32.0 2024-09-22 21:40:43,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=15.0 2024-09-22 21:40:46,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=108453.33333333333, ans=0.125 2024-09-22 21:40:53,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=108500.0, ans=0.0 2024-09-22 21:41:01,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=108500.0, ans=0.0 2024-09-22 21:41:01,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=108500.0, ans=0.125 2024-09-22 21:41:09,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=108546.66666666667, ans=0.025 2024-09-22 21:41:20,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=108546.66666666667, ans=0.125 2024-09-22 21:41:29,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=108593.33333333333, ans=0.1 2024-09-22 21:41:42,125 INFO [train.py:1198] (3/4) Epoch 6, batch 3800, loss[loss=0.3685, ctc_loss=0.2812, cr_loss=0.4367, over 11740.00 frames. ], tot_loss[loss=0.2846, ctc_loss=0.2045, cr_loss=0.4006, over 3332912.83 frames. ], batch size: 123, lr: 1.89e-02, grad_scale: 32.0 2024-09-22 21:41:50,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=108640.0, ans=0.1 2024-09-22 21:41:58,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=15.0 2024-09-22 21:42:10,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=108686.66666666667, ans=0.125 2024-09-22 21:42:19,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=108733.33333333333, ans=0.125 2024-09-22 21:42:39,194 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.201e+02 1.543e+02 1.830e+02 2.234e+02 3.927e+02, threshold=3.660e+02, percent-clipped=2.0 2024-09-22 21:43:00,880 INFO [train.py:1198] (3/4) Epoch 6, batch 3850, loss[loss=0.3289, ctc_loss=0.2461, cr_loss=0.414, over 11814.00 frames. ], tot_loss[loss=0.2861, ctc_loss=0.2062, cr_loss=0.3994, over 3271668.96 frames. ], batch size: 123, lr: 1.89e-02, grad_scale: 32.0 2024-09-22 21:43:04,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=108873.33333333333, ans=0.125 2024-09-22 21:43:18,816 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:43:37,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.14 vs. limit=5.0 2024-09-22 21:43:45,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=109013.33333333333, ans=0.0 2024-09-22 21:43:54,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=109013.33333333333, ans=0.125 2024-09-22 21:43:59,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=109013.33333333333, ans=0.125 2024-09-22 21:45:03,019 INFO [train.py:1198] (3/4) Epoch 7, batch 0, loss[loss=0.33, ctc_loss=0.2473, cr_loss=0.4135, over 15843.00 frames. ], tot_loss[loss=0.33, ctc_loss=0.2473, cr_loss=0.4135, over 15843.00 frames. ], batch size: 74, lr: 1.77e-02, grad_scale: 32.0 2024-09-22 21:45:03,019 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 21:45:18,425 INFO [train.py:1230] (3/4) Epoch 7, validation: loss=0.06283, ctc_loss=0.06283, cr_loss=7.028e-15, over 944034.00 frames. 2024-09-22 21:45:18,426 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-22 21:45:34,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=109134.66666666667, ans=0.0 2024-09-22 21:45:49,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=15.0 2024-09-22 21:46:24,181 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.230e+02 1.560e+02 1.911e+02 2.609e+02 3.890e+02, threshold=3.822e+02, percent-clipped=2.0 2024-09-22 21:46:39,894 INFO [train.py:1198] (3/4) Epoch 7, batch 50, loss[loss=0.2332, ctc_loss=0.166, cr_loss=0.3361, over 17031.00 frames. ], tot_loss[loss=0.28, ctc_loss=0.2007, cr_loss=0.3963, over 757905.60 frames. ], batch size: 44, lr: 1.76e-02, grad_scale: 32.0 2024-09-22 21:46:47,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2024-09-22 21:47:11,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=109368.0, ans=0.125 2024-09-22 21:47:46,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=109508.0, ans=0.025 2024-09-22 21:47:55,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=109508.0, ans=0.0 2024-09-22 21:48:00,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=109508.0, ans=0.0 2024-09-22 21:48:04,580 INFO [train.py:1198] (3/4) Epoch 7, batch 100, loss[loss=0.2681, ctc_loss=0.1894, cr_loss=0.3931, over 17079.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.2009, cr_loss=0.3976, over 1334648.50 frames. ], batch size: 46, lr: 1.76e-02, grad_scale: 32.0 2024-09-22 21:48:15,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=109554.66666666667, ans=0.0 2024-09-22 21:48:38,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=109648.0, ans=0.125 2024-09-22 21:48:52,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=109694.66666666667, ans=0.07 2024-09-22 21:49:00,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=109694.66666666667, ans=0.125 2024-09-22 21:49:02,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2024-09-22 21:49:07,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.24 vs. limit=22.5 2024-09-22 21:49:07,941 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.339e+02 1.498e+02 1.759e+02 2.443e+02, threshold=2.996e+02, percent-clipped=0.0 2024-09-22 21:49:26,897 INFO [train.py:1198] (3/4) Epoch 7, batch 150, loss[loss=0.2685, ctc_loss=0.1934, cr_loss=0.376, over 17073.00 frames. ], tot_loss[loss=0.2793, ctc_loss=0.2, cr_loss=0.3964, over 1785478.75 frames. ], batch size: 46, lr: 1.76e-02, grad_scale: 32.0 2024-09-22 21:49:28,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=109788.0, ans=0.02 2024-09-22 21:49:36,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=109788.0, ans=0.125 2024-09-22 21:49:41,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-09-22 21:50:44,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.28 vs. limit=10.0 2024-09-22 21:50:44,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.17 vs. limit=15.0 2024-09-22 21:50:48,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=110021.33333333333, ans=0.125 2024-09-22 21:50:49,451 INFO [train.py:1198] (3/4) Epoch 7, batch 200, loss[loss=0.2934, ctc_loss=0.2144, cr_loss=0.3952, over 16094.00 frames. ], tot_loss[loss=0.2788, ctc_loss=0.1995, cr_loss=0.3961, over 2132507.62 frames. ], batch size: 74, lr: 1.76e-02, grad_scale: 32.0 2024-09-22 21:50:51,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=110021.33333333333, ans=0.125 2024-09-22 21:50:54,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=110021.33333333333, ans=0.0 2024-09-22 21:51:20,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=110114.66666666667, ans=6.0 2024-09-22 21:51:52,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=110208.0, ans=0.2 2024-09-22 21:51:53,493 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.194e+02 1.411e+02 1.617e+02 1.825e+02 4.000e+02, threshold=3.234e+02, percent-clipped=2.0 2024-09-22 21:52:06,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=110208.0, ans=0.1 2024-09-22 21:52:06,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=110208.0, ans=0.125 2024-09-22 21:52:12,072 INFO [train.py:1198] (3/4) Epoch 7, batch 250, loss[loss=0.3049, ctc_loss=0.2184, cr_loss=0.4324, over 15860.00 frames. ], tot_loss[loss=0.2797, ctc_loss=0.2002, cr_loss=0.3978, over 2403258.57 frames. ], batch size: 74, lr: 1.76e-02, grad_scale: 32.0 2024-09-22 21:52:14,137 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:52:18,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=110254.66666666667, ans=0.0 2024-09-22 21:53:14,893 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-09-22 21:53:34,854 INFO [train.py:1198] (3/4) Epoch 7, batch 300, loss[loss=0.3169, ctc_loss=0.2323, cr_loss=0.4233, over 16993.00 frames. ], tot_loss[loss=0.2805, ctc_loss=0.2007, cr_loss=0.3992, over 2617135.84 frames. ], batch size: 53, lr: 1.76e-02, grad_scale: 32.0 2024-09-22 21:53:35,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2024-09-22 21:54:08,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.46 vs. limit=22.5 2024-09-22 21:54:41,018 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.225e+02 1.446e+02 1.620e+02 1.814e+02 2.683e+02, threshold=3.241e+02, percent-clipped=0.0 2024-09-22 21:54:41,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=110674.66666666667, ans=0.1 2024-09-22 21:54:56,993 INFO [train.py:1198] (3/4) Epoch 7, batch 350, loss[loss=0.269, ctc_loss=0.1918, cr_loss=0.386, over 17188.00 frames. ], tot_loss[loss=0.2815, ctc_loss=0.2015, cr_loss=0.4, over 2771984.19 frames. ], batch size: 55, lr: 1.75e-02, grad_scale: 32.0 2024-09-22 21:55:08,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=110721.33333333333, ans=0.125 2024-09-22 21:55:13,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=110768.0, ans=0.2 2024-09-22 21:55:38,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=110814.66666666667, ans=0.125 2024-09-22 21:55:38,496 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:55:51,808 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.73 vs. limit=15.0 2024-09-22 21:56:19,160 INFO [train.py:1198] (3/4) Epoch 7, batch 400, loss[loss=0.2586, ctc_loss=0.1827, cr_loss=0.3796, over 17062.00 frames. ], tot_loss[loss=0.2803, ctc_loss=0.2004, cr_loss=0.3994, over 2910031.90 frames. ], batch size: 46, lr: 1.75e-02, grad_scale: 32.0 2024-09-22 21:56:24,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110954.66666666667, ans=0.1 2024-09-22 21:56:24,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=110954.66666666667, ans=0.125 2024-09-22 21:57:14,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=111094.66666666667, ans=0.125 2024-09-22 21:57:21,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=111094.66666666667, ans=10.0 2024-09-22 21:57:25,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=111141.33333333333, ans=15.0 2024-09-22 21:57:25,761 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.408e+02 1.555e+02 1.806e+02 2.695e+02, threshold=3.109e+02, percent-clipped=0.0 2024-09-22 21:57:41,729 INFO [train.py:1198] (3/4) Epoch 7, batch 450, loss[loss=0.2456, ctc_loss=0.1738, cr_loss=0.3592, over 17263.00 frames. ], tot_loss[loss=0.2799, ctc_loss=0.2, cr_loss=0.3994, over 3007030.31 frames. ], batch size: 42, lr: 1.75e-02, grad_scale: 32.0 2024-09-22 21:57:47,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=111188.0, ans=0.0 2024-09-22 21:58:16,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=111281.33333333333, ans=0.0 2024-09-22 21:58:22,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=111281.33333333333, ans=0.125 2024-09-22 21:58:24,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=111281.33333333333, ans=0.125 2024-09-22 21:58:35,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=111328.0, ans=0.2 2024-09-22 21:58:43,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=111328.0, ans=0.125 2024-09-22 21:58:49,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111374.66666666667, ans=0.1 2024-09-22 21:59:03,789 INFO [train.py:1198] (3/4) Epoch 7, batch 500, loss[loss=0.3067, ctc_loss=0.2181, cr_loss=0.443, over 16996.00 frames. ], tot_loss[loss=0.2798, ctc_loss=0.1999, cr_loss=0.3999, over 3090756.22 frames. ], batch size: 53, lr: 1.75e-02, grad_scale: 32.0 2024-09-22 21:59:14,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=111421.33333333333, ans=0.05 2024-09-22 21:59:16,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111421.33333333333, ans=0.1 2024-09-22 21:59:16,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.80 vs. limit=10.0 2024-09-22 21:59:17,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.58 vs. limit=5.0 2024-09-22 21:59:22,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=111468.0, ans=0.125 2024-09-22 21:59:25,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111468.0, ans=0.1 2024-09-22 21:59:39,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=22.5 2024-09-22 22:00:00,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=111561.33333333333, ans=0.1 2024-09-22 22:00:09,622 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.405e+02 1.626e+02 1.844e+02 3.754e+02, threshold=3.253e+02, percent-clipped=1.0 2024-09-22 22:00:11,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=111608.0, ans=0.125 2024-09-22 22:00:25,409 INFO [train.py:1198] (3/4) Epoch 7, batch 550, loss[loss=0.2559, ctc_loss=0.1811, cr_loss=0.374, over 17250.00 frames. ], tot_loss[loss=0.2797, ctc_loss=0.1998, cr_loss=0.3998, over 3142439.40 frames. ], batch size: 44, lr: 1.75e-02, grad_scale: 32.0 2024-09-22 22:00:47,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=111701.33333333333, ans=0.0 2024-09-22 22:00:53,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2024-09-22 22:01:10,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=111748.0, ans=0.125 2024-09-22 22:01:23,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=111794.66666666667, ans=0.5 2024-09-22 22:01:47,903 INFO [train.py:1198] (3/4) Epoch 7, batch 600, loss[loss=0.2808, ctc_loss=0.1971, cr_loss=0.4187, over 17075.00 frames. ], tot_loss[loss=0.2801, ctc_loss=0.2001, cr_loss=0.3998, over 3187061.41 frames. ], batch size: 46, lr: 1.75e-02, grad_scale: 32.0 2024-09-22 22:01:58,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=111888.0, ans=0.0 2024-09-22 22:02:06,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=111934.66666666667, ans=0.2 2024-09-22 22:02:09,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=111934.66666666667, ans=0.0 2024-09-22 22:02:56,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=112028.0, ans=10.0 2024-09-22 22:02:58,792 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.416e+02 1.598e+02 2.018e+02 3.504e+02, threshold=3.196e+02, percent-clipped=2.0 2024-09-22 22:03:14,673 INFO [train.py:1198] (3/4) Epoch 7, batch 650, loss[loss=0.3695, ctc_loss=0.2767, cr_loss=0.464, over 15084.00 frames. ], tot_loss[loss=0.2801, ctc_loss=0.2002, cr_loss=0.3997, over 3221609.22 frames. ], batch size: 89, lr: 1.74e-02, grad_scale: 32.0 2024-09-22 22:03:15,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.51 vs. limit=22.5 2024-09-22 22:03:51,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112214.66666666667, ans=0.1 2024-09-22 22:04:14,749 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 22:04:16,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112261.33333333333, ans=0.1 2024-09-22 22:04:18,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2024-09-22 22:04:21,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=112308.0, ans=0.0 2024-09-22 22:04:36,943 INFO [train.py:1198] (3/4) Epoch 7, batch 700, loss[loss=0.2422, ctc_loss=0.1674, cr_loss=0.3741, over 16698.00 frames. ], tot_loss[loss=0.2796, ctc_loss=0.2, cr_loss=0.3981, over 3253321.00 frames. ], batch size: 37, lr: 1.74e-02, grad_scale: 32.0 2024-09-22 22:04:54,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=112401.33333333333, ans=0.0 2024-09-22 22:04:57,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=112401.33333333333, ans=0.125 2024-09-22 22:05:15,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=112448.0, ans=0.125 2024-09-22 22:05:27,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.29 vs. limit=22.5 2024-09-22 22:05:28,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=112494.66666666667, ans=0.125 2024-09-22 22:05:42,318 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.430e+02 1.586e+02 1.856e+02 3.477e+02, threshold=3.173e+02, percent-clipped=1.0 2024-09-22 22:05:45,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=112541.33333333333, ans=0.05 2024-09-22 22:05:58,157 INFO [train.py:1198] (3/4) Epoch 7, batch 750, loss[loss=0.2496, ctc_loss=0.1777, cr_loss=0.3593, over 17106.00 frames. ], tot_loss[loss=0.2793, ctc_loss=0.2, cr_loss=0.3967, over 3275947.45 frames. ], batch size: 40, lr: 1.74e-02, grad_scale: 32.0 2024-09-22 22:06:01,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=112588.0, ans=0.125 2024-09-22 22:06:06,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112588.0, ans=0.1 2024-09-22 22:06:12,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=112634.66666666667, ans=0.025 2024-09-22 22:06:36,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=22.5 2024-09-22 22:06:38,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=112681.33333333333, ans=0.0 2024-09-22 22:07:19,628 INFO [train.py:1198] (3/4) Epoch 7, batch 800, loss[loss=0.2371, ctc_loss=0.1682, cr_loss=0.3444, over 17063.00 frames. ], tot_loss[loss=0.2793, ctc_loss=0.1999, cr_loss=0.3969, over 3291165.01 frames. ], batch size: 43, lr: 1.74e-02, grad_scale: 32.0 2024-09-22 22:07:33,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=112821.33333333333, ans=0.125 2024-09-22 22:07:38,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112868.0, ans=0.1 2024-09-22 22:08:03,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2024-09-22 22:08:04,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=112914.66666666667, ans=0.0 2024-09-22 22:08:26,195 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.385e+02 1.523e+02 1.724e+02 2.705e+02, threshold=3.046e+02, percent-clipped=0.0 2024-09-22 22:08:28,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=113008.0, ans=0.125 2024-09-22 22:08:41,936 INFO [train.py:1198] (3/4) Epoch 7, batch 850, loss[loss=0.215, ctc_loss=0.1506, cr_loss=0.3222, over 17129.00 frames. ], tot_loss[loss=0.2806, ctc_loss=0.2008, cr_loss=0.3987, over 3298288.47 frames. ], batch size: 40, lr: 1.74e-02, grad_scale: 32.0 2024-09-22 22:08:50,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=113054.66666666667, ans=0.2 2024-09-22 22:09:17,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=113148.0, ans=0.1 2024-09-22 22:09:45,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=113194.66666666667, ans=0.125 2024-09-22 22:09:52,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=113241.33333333333, ans=0.125 2024-09-22 22:09:56,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=113241.33333333333, ans=0.0 2024-09-22 22:10:03,667 INFO [train.py:1198] (3/4) Epoch 7, batch 900, loss[loss=0.3147, ctc_loss=0.231, cr_loss=0.4184, over 15299.00 frames. ], tot_loss[loss=0.2799, ctc_loss=0.2002, cr_loss=0.3985, over 3306357.84 frames. ], batch size: 89, lr: 1.74e-02, grad_scale: 32.0 2024-09-22 22:10:17,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=113288.0, ans=0.1 2024-09-22 22:10:30,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2024-09-22 22:10:33,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=113334.66666666667, ans=0.125 2024-09-22 22:10:37,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.46 vs. limit=10.0 2024-09-22 22:10:39,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113381.33333333333, ans=0.1 2024-09-22 22:10:44,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=113381.33333333333, ans=0.2 2024-09-22 22:10:54,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=113428.0, ans=0.0 2024-09-22 22:10:55,606 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 22:11:06,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=113428.0, ans=0.125 2024-09-22 22:11:09,266 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.239e+02 1.440e+02 1.570e+02 1.832e+02 2.236e+02, threshold=3.140e+02, percent-clipped=0.0 2024-09-22 22:11:13,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2024-09-22 22:11:20,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=113474.66666666667, ans=0.0 2024-09-22 22:11:22,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=113474.66666666667, ans=0.125 2024-09-22 22:11:23,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=113521.33333333333, ans=0.125 2024-09-22 22:11:25,182 INFO [train.py:1198] (3/4) Epoch 7, batch 950, loss[loss=0.2916, ctc_loss=0.2089, cr_loss=0.4135, over 17300.00 frames. ], tot_loss[loss=0.2806, ctc_loss=0.2009, cr_loss=0.3986, over 3309601.08 frames. ], batch size: 49, lr: 1.73e-02, grad_scale: 32.0 2024-09-22 22:11:25,565 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 22:11:25,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=113521.33333333333, ans=0.125 2024-09-22 22:11:53,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=113568.0, ans=0.125 2024-09-22 22:12:04,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=113614.66666666667, ans=0.025 2024-09-22 22:12:34,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=113708.0, ans=0.125 2024-09-22 22:12:50,312 INFO [train.py:1198] (3/4) Epoch 7, batch 1000, loss[loss=0.2764, ctc_loss=0.1987, cr_loss=0.3882, over 17317.00 frames. ], tot_loss[loss=0.2788, ctc_loss=0.1993, cr_loss=0.3974, over 3324949.96 frames. ], batch size: 51, lr: 1.73e-02, grad_scale: 32.0 2024-09-22 22:12:56,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=113754.66666666667, ans=0.0 2024-09-22 22:13:00,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=113754.66666666667, ans=0.125 2024-09-22 22:13:09,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113801.33333333333, ans=0.1 2024-09-22 22:13:17,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=113801.33333333333, ans=0.125 2024-09-22 22:13:22,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=113848.0, ans=0.125 2024-09-22 22:13:36,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2024-09-22 22:13:47,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=113894.66666666667, ans=0.0 2024-09-22 22:13:53,767 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.387e+02 1.549e+02 1.823e+02 4.640e+02, threshold=3.099e+02, percent-clipped=1.0 2024-09-22 22:13:57,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=22.5 2024-09-22 22:14:09,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=113941.33333333333, ans=0.025 2024-09-22 22:14:12,273 INFO [train.py:1198] (3/4) Epoch 7, batch 1050, loss[loss=0.2411, ctc_loss=0.1697, cr_loss=0.3569, over 17307.00 frames. ], tot_loss[loss=0.2776, ctc_loss=0.1982, cr_loss=0.3969, over 3338305.74 frames. ], batch size: 51, lr: 1.73e-02, grad_scale: 32.0 2024-09-22 22:14:27,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=114034.66666666667, ans=0.5 2024-09-22 22:14:41,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=114034.66666666667, ans=0.125 2024-09-22 22:14:44,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=114081.33333333333, ans=0.0 2024-09-22 22:15:34,456 INFO [train.py:1198] (3/4) Epoch 7, batch 1100, loss[loss=0.2785, ctc_loss=0.1971, cr_loss=0.4073, over 17056.00 frames. ], tot_loss[loss=0.2773, ctc_loss=0.1981, cr_loss=0.396, over 3330709.98 frames. ], batch size: 52, lr: 1.73e-02, grad_scale: 32.0 2024-09-22 22:15:45,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=114221.33333333333, ans=0.025 2024-09-22 22:15:47,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=114221.33333333333, ans=0.125 2024-09-22 22:16:02,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-09-22 22:16:05,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=114314.66666666667, ans=0.125 2024-09-22 22:16:19,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=114314.66666666667, ans=0.0 2024-09-22 22:16:23,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=114361.33333333333, ans=0.125 2024-09-22 22:16:37,934 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.457e+02 1.783e+02 2.141e+02 3.294e+02, threshold=3.566e+02, percent-clipped=3.0 2024-09-22 22:16:43,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=114408.0, ans=0.1 2024-09-22 22:16:56,277 INFO [train.py:1198] (3/4) Epoch 7, batch 1150, loss[loss=0.2394, ctc_loss=0.1623, cr_loss=0.3853, over 17009.00 frames. ], tot_loss[loss=0.2777, ctc_loss=0.1983, cr_loss=0.3967, over 3335557.54 frames. ], batch size: 44, lr: 1.73e-02, grad_scale: 32.0 2024-09-22 22:17:07,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=114454.66666666667, ans=0.125 2024-09-22 22:17:10,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=114501.33333333333, ans=0.125 2024-09-22 22:17:43,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2024-09-22 22:17:47,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=114594.66666666667, ans=0.125 2024-09-22 22:17:54,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=114594.66666666667, ans=0.95 2024-09-22 22:17:56,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=114594.66666666667, ans=0.125 2024-09-22 22:18:01,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.11 vs. limit=15.0 2024-09-22 22:18:02,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=114641.33333333333, ans=0.02 2024-09-22 22:18:10,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-09-22 22:18:16,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=114688.0, ans=0.125 2024-09-22 22:18:16,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=114688.0, ans=0.125 2024-09-22 22:18:17,893 INFO [train.py:1198] (3/4) Epoch 7, batch 1200, loss[loss=0.2823, ctc_loss=0.2039, cr_loss=0.3923, over 14958.00 frames. ], tot_loss[loss=0.2756, ctc_loss=0.1966, cr_loss=0.3952, over 3341261.51 frames. ], batch size: 89, lr: 1.73e-02, grad_scale: 32.0 2024-09-22 22:18:18,247 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 22:18:22,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=114688.0, ans=0.125 2024-09-22 22:18:27,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=114688.0, ans=10.0 2024-09-22 22:19:05,663 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 22:19:10,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=114828.0, ans=0.125 2024-09-22 22:19:24,538 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.404e+02 1.583e+02 1.854e+02 5.575e+02, threshold=3.166e+02, percent-clipped=2.0 2024-09-22 22:19:40,344 INFO [train.py:1198] (3/4) Epoch 7, batch 1250, loss[loss=0.2608, ctc_loss=0.1825, cr_loss=0.3912, over 17270.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1962, cr_loss=0.3938, over 3338010.10 frames. ], batch size: 44, lr: 1.72e-02, grad_scale: 32.0 2024-09-22 22:20:07,029 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 22:20:15,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=12.0 2024-09-22 22:20:19,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=12.0 2024-09-22 22:20:41,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=115061.33333333333, ans=0.0 2024-09-22 22:21:01,584 INFO [train.py:1198] (3/4) Epoch 7, batch 1300, loss[loss=0.2751, ctc_loss=0.1973, cr_loss=0.3888, over 17051.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1951, cr_loss=0.3921, over 3346501.30 frames. ], batch size: 46, lr: 1.72e-02, grad_scale: 16.0 2024-09-22 22:21:28,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=115201.33333333333, ans=0.0 2024-09-22 22:21:31,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115248.0, ans=0.1 2024-09-22 22:21:50,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=115294.66666666667, ans=0.125 2024-09-22 22:21:55,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-09-22 22:22:03,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=115294.66666666667, ans=0.1 2024-09-22 22:22:09,485 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.485e+02 1.678e+02 2.013e+02 3.139e+02, threshold=3.356e+02, percent-clipped=0.0 2024-09-22 22:22:26,410 INFO [train.py:1198] (3/4) Epoch 7, batch 1350, loss[loss=0.2918, ctc_loss=0.2117, cr_loss=0.4006, over 17138.00 frames. ], tot_loss[loss=0.2738, ctc_loss=0.1952, cr_loss=0.393, over 3351639.51 frames. ], batch size: 48, lr: 1.72e-02, grad_scale: 16.0 2024-09-22 22:22:28,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=115388.0, ans=0.0 2024-09-22 22:23:24,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=115528.0, ans=0.0 2024-09-22 22:23:28,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115574.66666666667, ans=0.1 2024-09-22 22:23:46,225 INFO [train.py:1198] (3/4) Epoch 7, batch 1400, loss[loss=0.2631, ctc_loss=0.19, cr_loss=0.3655, over 16989.00 frames. ], tot_loss[loss=0.2741, ctc_loss=0.1954, cr_loss=0.3938, over 3354869.55 frames. ], batch size: 42, lr: 1.72e-02, grad_scale: 16.0 2024-09-22 22:24:11,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=115668.0, ans=0.0 2024-09-22 22:24:18,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=115714.66666666667, ans=0.015 2024-09-22 22:24:33,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=115714.66666666667, ans=0.125 2024-09-22 22:24:52,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115808.0, ans=0.1 2024-09-22 22:24:54,140 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.408e+02 1.629e+02 2.097e+02 4.051e+02, threshold=3.259e+02, percent-clipped=2.0 2024-09-22 22:24:54,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=115808.0, ans=0.125 2024-09-22 22:24:58,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=12.0 2024-09-22 22:25:06,693 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.09 vs. limit=15.0 2024-09-22 22:25:10,924 INFO [train.py:1198] (3/4) Epoch 7, batch 1450, loss[loss=0.2857, ctc_loss=0.2041, cr_loss=0.4083, over 17154.00 frames. ], tot_loss[loss=0.2771, ctc_loss=0.1979, cr_loss=0.3958, over 3337408.30 frames. ], batch size: 48, lr: 1.72e-02, grad_scale: 16.0 2024-09-22 22:25:12,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=115854.66666666667, ans=0.0 2024-09-22 22:25:24,062 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2024-09-22 22:25:26,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=115901.33333333333, ans=0.5 2024-09-22 22:25:36,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=115901.33333333333, ans=0.125 2024-09-22 22:25:42,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=115948.0, ans=0.125 2024-09-22 22:25:52,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=115948.0, ans=0.5 2024-09-22 22:26:11,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=115994.66666666667, ans=0.0 2024-09-22 22:26:21,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=116041.33333333333, ans=0.015 2024-09-22 22:26:28,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.62 vs. limit=5.0 2024-09-22 22:26:32,310 INFO [train.py:1198] (3/4) Epoch 7, batch 1500, loss[loss=0.2703, ctc_loss=0.1882, cr_loss=0.4102, over 17005.00 frames. ], tot_loss[loss=0.2779, ctc_loss=0.1985, cr_loss=0.3974, over 3342579.81 frames. ], batch size: 53, lr: 1.72e-02, grad_scale: 16.0 2024-09-22 22:26:42,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=116088.0, ans=0.1 2024-09-22 22:26:43,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=116088.0, ans=0.125 2024-09-22 22:26:57,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=15.0 2024-09-22 22:27:11,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=116181.33333333333, ans=0.125 2024-09-22 22:27:13,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=116181.33333333333, ans=0.125 2024-09-22 22:27:39,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=116274.66666666667, ans=0.1 2024-09-22 22:27:40,531 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.248e+02 1.482e+02 1.664e+02 1.936e+02 3.285e+02, threshold=3.328e+02, percent-clipped=1.0 2024-09-22 22:27:41,240 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2024-09-22 22:27:44,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.64 vs. limit=10.0 2024-09-22 22:27:50,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=116274.66666666667, ans=0.1 2024-09-22 22:27:51,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=116274.66666666667, ans=0.125 2024-09-22 22:27:53,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=116321.33333333333, ans=0.05 2024-09-22 22:27:53,939 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=22.5 2024-09-22 22:27:54,708 INFO [train.py:1198] (3/4) Epoch 7, batch 1550, loss[loss=0.3308, ctc_loss=0.2502, cr_loss=0.4027, over 11986.00 frames. ], tot_loss[loss=0.2779, ctc_loss=0.1983, cr_loss=0.3978, over 3343397.19 frames. ], batch size: 123, lr: 1.71e-02, grad_scale: 16.0 2024-09-22 22:27:55,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=116321.33333333333, ans=0.125 2024-09-22 22:28:18,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=116368.0, ans=0.0 2024-09-22 22:28:54,800 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 22:29:03,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=116508.0, ans=0.125 2024-09-22 22:29:16,547 INFO [train.py:1198] (3/4) Epoch 7, batch 1600, loss[loss=0.2892, ctc_loss=0.2183, cr_loss=0.3549, over 16643.00 frames. ], tot_loss[loss=0.2765, ctc_loss=0.1972, cr_loss=0.3967, over 3353799.56 frames. ], batch size: 66, lr: 1.71e-02, grad_scale: 32.0 2024-09-22 22:29:31,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=116601.33333333333, ans=0.125 2024-09-22 22:29:35,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=116601.33333333333, ans=0.125 2024-09-22 22:29:38,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=116601.33333333333, ans=0.125 2024-09-22 22:30:03,995 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 22:30:15,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.08 vs. limit=22.5 2024-09-22 22:30:24,105 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.410e+02 1.579e+02 1.948e+02 3.056e+02, threshold=3.158e+02, percent-clipped=0.0 2024-09-22 22:30:26,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=116741.33333333333, ans=0.1 2024-09-22 22:30:37,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2024-09-22 22:30:38,423 INFO [train.py:1198] (3/4) Epoch 7, batch 1650, loss[loss=0.2325, ctc_loss=0.1655, cr_loss=0.3349, over 17018.00 frames. ], tot_loss[loss=0.2764, ctc_loss=0.1971, cr_loss=0.3964, over 3353999.75 frames. ], batch size: 44, lr: 1.71e-02, grad_scale: 32.0 2024-09-22 22:30:43,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=116788.0, ans=0.2 2024-09-22 22:31:03,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=116834.66666666667, ans=0.0 2024-09-22 22:31:55,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=116974.66666666667, ans=0.125 2024-09-22 22:31:59,728 INFO [train.py:1198] (3/4) Epoch 7, batch 1700, loss[loss=0.3026, ctc_loss=0.2132, cr_loss=0.4474, over 17365.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.1962, cr_loss=0.3957, over 3344546.67 frames. ], batch size: 48, lr: 1.71e-02, grad_scale: 32.0 2024-09-22 22:32:26,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.44 vs. limit=15.0 2024-09-22 22:32:33,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.46 vs. limit=6.0 2024-09-22 22:32:37,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=117114.66666666667, ans=0.0 2024-09-22 22:32:40,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=117114.66666666667, ans=0.125 2024-09-22 22:32:54,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=117161.33333333333, ans=0.125 2024-09-22 22:33:07,038 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.332e+02 1.436e+02 1.651e+02 2.321e+02, threshold=2.871e+02, percent-clipped=0.0 2024-09-22 22:33:21,095 INFO [train.py:1198] (3/4) Epoch 7, batch 1750, loss[loss=0.3015, ctc_loss=0.2167, cr_loss=0.4244, over 16793.00 frames. ], tot_loss[loss=0.2748, ctc_loss=0.1957, cr_loss=0.3957, over 3348969.95 frames. ], batch size: 61, lr: 1.71e-02, grad_scale: 32.0 2024-09-22 22:34:03,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=117348.0, ans=0.0 2024-09-22 22:34:12,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2024-09-22 22:34:13,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=117394.66666666667, ans=0.2 2024-09-22 22:34:34,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=8.0 2024-09-22 22:34:45,740 INFO [train.py:1198] (3/4) Epoch 7, batch 1800, loss[loss=0.2793, ctc_loss=0.1955, cr_loss=0.419, over 17112.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.1954, cr_loss=0.3954, over 3343206.86 frames. ], batch size: 49, lr: 1.71e-02, grad_scale: 32.0 2024-09-22 22:34:47,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=117488.0, ans=0.125 2024-09-22 22:34:52,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2024-09-22 22:34:55,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=117488.0, ans=0.125 2024-09-22 22:35:03,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=117534.66666666667, ans=0.1 2024-09-22 22:35:33,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=117628.0, ans=0.2 2024-09-22 22:35:40,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=117628.0, ans=0.0 2024-09-22 22:35:49,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117674.66666666667, ans=0.1 2024-09-22 22:35:50,922 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.391e+02 1.562e+02 1.960e+02 3.440e+02, threshold=3.125e+02, percent-clipped=2.0 2024-09-22 22:36:03,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=117721.33333333333, ans=0.0 2024-09-22 22:36:04,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=117721.33333333333, ans=0.0 2024-09-22 22:36:05,279 INFO [train.py:1198] (3/4) Epoch 7, batch 1850, loss[loss=0.2535, ctc_loss=0.1789, cr_loss=0.3729, over 17094.00 frames. ], tot_loss[loss=0.2767, ctc_loss=0.1973, cr_loss=0.3973, over 3330825.36 frames. ], batch size: 49, lr: 1.71e-02, grad_scale: 32.0 2024-09-22 22:36:07,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=117721.33333333333, ans=0.0 2024-09-22 22:36:08,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=117721.33333333333, ans=0.125 2024-09-22 22:36:25,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=117768.0, ans=0.125 2024-09-22 22:36:25,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=15.0 2024-09-22 22:36:30,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117768.0, ans=0.1 2024-09-22 22:36:56,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=117861.33333333333, ans=0.2 2024-09-22 22:37:14,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=117908.0, ans=0.0 2024-09-22 22:37:29,663 INFO [train.py:1198] (3/4) Epoch 7, batch 1900, loss[loss=0.2802, ctc_loss=0.2018, cr_loss=0.3919, over 16924.00 frames. ], tot_loss[loss=0.2752, ctc_loss=0.196, cr_loss=0.3957, over 3342945.94 frames. ], batch size: 58, lr: 1.70e-02, grad_scale: 32.0 2024-09-22 22:37:37,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=117954.66666666667, ans=0.125 2024-09-22 22:37:37,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=117954.66666666667, ans=0.125 2024-09-22 22:37:48,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=118001.33333333333, ans=0.125 2024-09-22 22:38:28,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=118094.66666666667, ans=0.125 2024-09-22 22:38:37,127 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.224e+02 1.462e+02 1.627e+02 1.880e+02 2.641e+02, threshold=3.255e+02, percent-clipped=0.0 2024-09-22 22:38:43,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=118141.33333333333, ans=0.125 2024-09-22 22:38:51,414 INFO [train.py:1198] (3/4) Epoch 7, batch 1950, loss[loss=0.229, ctc_loss=0.1565, cr_loss=0.3626, over 17249.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1956, cr_loss=0.3957, over 3344830.72 frames. ], batch size: 42, lr: 1.70e-02, grad_scale: 32.0 2024-09-22 22:38:54,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=118188.0, ans=0.125 2024-09-22 22:38:58,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.40 vs. limit=15.0 2024-09-22 22:39:48,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=118328.0, ans=0.2 2024-09-22 22:39:51,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=118328.0, ans=0.1 2024-09-22 22:39:53,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2024-09-22 22:39:59,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=118374.66666666667, ans=0.125 2024-09-22 22:40:02,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=118374.66666666667, ans=0.1 2024-09-22 22:40:11,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=118421.33333333333, ans=0.125 2024-09-22 22:40:13,184 INFO [train.py:1198] (3/4) Epoch 7, batch 2000, loss[loss=0.281, ctc_loss=0.2062, cr_loss=0.3737, over 17302.00 frames. ], tot_loss[loss=0.2755, ctc_loss=0.1962, cr_loss=0.3962, over 3349551.66 frames. ], batch size: 49, lr: 1.70e-02, grad_scale: 32.0 2024-09-22 22:40:16,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=118421.33333333333, ans=0.0 2024-09-22 22:40:21,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=118421.33333333333, ans=0.125 2024-09-22 22:40:34,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=118468.0, ans=0.125 2024-09-22 22:40:38,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=118468.0, ans=0.025 2024-09-22 22:40:54,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.06 vs. limit=15.0 2024-09-22 22:41:21,330 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.217e+02 1.357e+02 1.496e+02 1.687e+02 2.654e+02, threshold=2.993e+02, percent-clipped=0.0 2024-09-22 22:41:21,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=118608.0, ans=0.0 2024-09-22 22:41:35,548 INFO [train.py:1198] (3/4) Epoch 7, batch 2050, loss[loss=0.2685, ctc_loss=0.1903, cr_loss=0.3908, over 17179.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.1962, cr_loss=0.3955, over 3355563.37 frames. ], batch size: 45, lr: 1.70e-02, grad_scale: 32.0 2024-09-22 22:41:53,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-09-22 22:42:12,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=118748.0, ans=0.0 2024-09-22 22:42:29,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=118794.66666666667, ans=0.125 2024-09-22 22:42:33,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=118794.66666666667, ans=0.0 2024-09-22 22:42:33,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=118794.66666666667, ans=15.0 2024-09-22 22:42:58,050 INFO [train.py:1198] (3/4) Epoch 7, batch 2100, loss[loss=0.2724, ctc_loss=0.1958, cr_loss=0.383, over 17226.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.196, cr_loss=0.395, over 3349045.05 frames. ], batch size: 55, lr: 1.70e-02, grad_scale: 32.0 2024-09-22 22:42:58,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.12 vs. limit=22.5 2024-09-22 22:43:11,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.55 vs. limit=22.5 2024-09-22 22:43:15,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118934.66666666667, ans=0.1 2024-09-22 22:43:25,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=118934.66666666667, ans=0.125 2024-09-22 22:43:29,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=118981.33333333333, ans=0.0 2024-09-22 22:43:30,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=118981.33333333333, ans=0.5 2024-09-22 22:44:06,953 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.431e+02 1.548e+02 1.785e+02 2.946e+02, threshold=3.097e+02, percent-clipped=0.0 2024-09-22 22:44:19,897 INFO [train.py:1198] (3/4) Epoch 7, batch 2150, loss[loss=0.2845, ctc_loss=0.2075, cr_loss=0.3848, over 17307.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.1963, cr_loss=0.3952, over 3348243.43 frames. ], batch size: 51, lr: 1.70e-02, grad_scale: 16.0 2024-09-22 22:44:58,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=119214.66666666667, ans=0.125 2024-09-22 22:45:14,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=119261.33333333333, ans=0.125 2024-09-22 22:45:16,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=119261.33333333333, ans=0.025 2024-09-22 22:45:35,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=12.0 2024-09-22 22:45:41,317 INFO [train.py:1198] (3/4) Epoch 7, batch 2200, loss[loss=0.2756, ctc_loss=0.1933, cr_loss=0.4114, over 17047.00 frames. ], tot_loss[loss=0.2741, ctc_loss=0.1952, cr_loss=0.3945, over 3359273.43 frames. ], batch size: 52, lr: 1.69e-02, grad_scale: 16.0 2024-09-22 22:46:08,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=119401.33333333333, ans=0.0 2024-09-22 22:46:38,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=119494.66666666667, ans=0.0 2024-09-22 22:46:40,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=119494.66666666667, ans=0.09899494936611666 2024-09-22 22:46:52,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=119541.33333333333, ans=0.125 2024-09-22 22:46:53,571 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.357e+02 1.455e+02 1.682e+02 2.486e+02, threshold=2.909e+02, percent-clipped=0.0 2024-09-22 22:46:53,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119541.33333333333, ans=0.1 2024-09-22 22:47:06,286 INFO [train.py:1198] (3/4) Epoch 7, batch 2250, loss[loss=0.2464, ctc_loss=0.1763, cr_loss=0.3506, over 17012.00 frames. ], tot_loss[loss=0.2744, ctc_loss=0.1954, cr_loss=0.3948, over 3363102.28 frames. ], batch size: 44, lr: 1.69e-02, grad_scale: 16.0 2024-09-22 22:47:09,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=119588.0, ans=0.0 2024-09-22 22:47:30,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=119634.66666666667, ans=0.125 2024-09-22 22:47:47,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=119681.33333333333, ans=0.125 2024-09-22 22:47:59,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2024-09-22 22:48:25,673 INFO [train.py:1198] (3/4) Epoch 7, batch 2300, loss[loss=0.2501, ctc_loss=0.174, cr_loss=0.3806, over 17151.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1962, cr_loss=0.3956, over 3359384.12 frames. ], batch size: 45, lr: 1.69e-02, grad_scale: 16.0 2024-09-22 22:49:06,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=119914.66666666667, ans=0.2 2024-09-22 22:49:31,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=119961.33333333333, ans=0.125 2024-09-22 22:49:36,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120008.0, ans=0.1 2024-09-22 22:49:37,566 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.210e+02 1.381e+02 1.499e+02 1.768e+02 3.628e+02, threshold=2.997e+02, percent-clipped=2.0 2024-09-22 22:49:39,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=120008.0, ans=0.5 2024-09-22 22:49:50,252 INFO [train.py:1198] (3/4) Epoch 7, batch 2350, loss[loss=0.2703, ctc_loss=0.1915, cr_loss=0.3936, over 17102.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1956, cr_loss=0.3966, over 3364701.20 frames. ], batch size: 49, lr: 1.69e-02, grad_scale: 16.0 2024-09-22 22:50:03,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=120054.66666666667, ans=0.125 2024-09-22 22:50:03,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2024-09-22 22:50:06,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=120101.33333333333, ans=0.2 2024-09-22 22:50:19,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=120101.33333333333, ans=0.125 2024-09-22 22:50:51,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=120194.66666666667, ans=0.125 2024-09-22 22:51:12,230 INFO [train.py:1198] (3/4) Epoch 7, batch 2400, loss[loss=0.2954, ctc_loss=0.2087, cr_loss=0.4336, over 17011.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1954, cr_loss=0.3961, over 3364221.87 frames. ], batch size: 53, lr: 1.69e-02, grad_scale: 32.0 2024-09-22 22:51:22,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=120288.0, ans=0.0 2024-09-22 22:51:33,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=120334.66666666667, ans=0.0 2024-09-22 22:51:36,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2024-09-22 22:52:02,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120428.0, ans=0.1 2024-09-22 22:52:18,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=120474.66666666667, ans=0.125 2024-09-22 22:52:21,490 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.189e+02 1.376e+02 1.485e+02 1.691e+02 2.822e+02, threshold=2.971e+02, percent-clipped=0.0 2024-09-22 22:52:28,822 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-09-22 22:52:34,409 INFO [train.py:1198] (3/4) Epoch 7, batch 2450, loss[loss=0.2406, ctc_loss=0.174, cr_loss=0.3333, over 17146.00 frames. ], tot_loss[loss=0.2738, ctc_loss=0.1948, cr_loss=0.3951, over 3366165.22 frames. ], batch size: 48, lr: 1.69e-02, grad_scale: 32.0 2024-09-22 22:53:33,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=120661.33333333333, ans=0.1 2024-09-22 22:53:39,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=120708.0, ans=0.0 2024-09-22 22:53:56,720 INFO [train.py:1198] (3/4) Epoch 7, batch 2500, loss[loss=0.2868, ctc_loss=0.2025, cr_loss=0.4214, over 17292.00 frames. ], tot_loss[loss=0.2734, ctc_loss=0.1944, cr_loss=0.3948, over 3371630.50 frames. ], batch size: 46, lr: 1.69e-02, grad_scale: 32.0 2024-09-22 22:53:58,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=120754.66666666667, ans=10.0 2024-09-22 22:54:41,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=22.5 2024-09-22 22:54:44,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=120848.0, ans=0.025 2024-09-22 22:54:44,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2024-09-22 22:54:48,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.88 vs. limit=12.0 2024-09-22 22:55:06,268 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.238e+02 1.457e+02 1.729e+02 2.020e+02 3.233e+02, threshold=3.458e+02, percent-clipped=3.0 2024-09-22 22:55:17,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=120988.0, ans=0.025 2024-09-22 22:55:18,952 INFO [train.py:1198] (3/4) Epoch 7, batch 2550, loss[loss=0.2733, ctc_loss=0.1952, cr_loss=0.3903, over 17251.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1945, cr_loss=0.395, over 3372097.78 frames. ], batch size: 44, lr: 1.68e-02, grad_scale: 32.0 2024-09-22 22:55:37,608 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.83 vs. limit=15.0 2024-09-22 22:55:51,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=22.5 2024-09-22 22:56:19,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.67 vs. limit=15.0 2024-09-22 22:56:22,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=121128.0, ans=0.2 2024-09-22 22:56:28,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=121174.66666666667, ans=0.1 2024-09-22 22:56:40,342 INFO [train.py:1198] (3/4) Epoch 7, batch 2600, loss[loss=0.2259, ctc_loss=0.1597, cr_loss=0.3311, over 17285.00 frames. ], tot_loss[loss=0.2732, ctc_loss=0.1942, cr_loss=0.3947, over 3379058.34 frames. ], batch size: 42, lr: 1.68e-02, grad_scale: 32.0 2024-09-22 22:56:49,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=121221.33333333333, ans=0.0 2024-09-22 22:57:22,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=121314.66666666667, ans=0.125 2024-09-22 22:57:42,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=121361.33333333333, ans=0.025 2024-09-22 22:57:49,628 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.441e+02 1.588e+02 1.825e+02 2.905e+02, threshold=3.176e+02, percent-clipped=0.0 2024-09-22 22:57:59,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=121408.0, ans=0.2 2024-09-22 22:58:02,495 INFO [train.py:1198] (3/4) Epoch 7, batch 2650, loss[loss=0.3057, ctc_loss=0.2172, cr_loss=0.4421, over 17018.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1961, cr_loss=0.3964, over 3366332.98 frames. ], batch size: 56, lr: 1.68e-02, grad_scale: 32.0 2024-09-22 22:58:22,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.00 vs. limit=22.5 2024-09-22 22:58:32,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.46 vs. limit=6.0 2024-09-22 22:58:44,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=121548.0, ans=0.0 2024-09-22 22:59:07,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=121641.33333333333, ans=0.2 2024-09-22 22:59:13,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=121641.33333333333, ans=0.2 2024-09-22 22:59:19,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=121641.33333333333, ans=0.0 2024-09-22 22:59:27,153 INFO [train.py:1198] (3/4) Epoch 7, batch 2700, loss[loss=0.2898, ctc_loss=0.2058, cr_loss=0.4199, over 17107.00 frames. ], tot_loss[loss=0.2756, ctc_loss=0.1963, cr_loss=0.3967, over 3369910.57 frames. ], batch size: 49, lr: 1.68e-02, grad_scale: 32.0 2024-09-22 22:59:36,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=15.0 2024-09-22 23:00:33,555 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.410e+02 1.530e+02 1.699e+02 3.124e+02, threshold=3.060e+02, percent-clipped=0.0 2024-09-22 23:00:48,620 INFO [train.py:1198] (3/4) Epoch 7, batch 2750, loss[loss=0.272, ctc_loss=0.1913, cr_loss=0.4034, over 17310.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1957, cr_loss=0.3963, over 3371235.70 frames. ], batch size: 49, lr: 1.68e-02, grad_scale: 32.0 2024-09-22 23:00:59,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=121921.33333333333, ans=0.2 2024-09-22 23:01:39,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=122061.33333333333, ans=0.0 2024-09-22 23:01:47,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122061.33333333333, ans=0.1 2024-09-22 23:02:10,734 INFO [train.py:1198] (3/4) Epoch 7, batch 2800, loss[loss=0.2795, ctc_loss=0.194, cr_loss=0.4273, over 17028.00 frames. ], tot_loss[loss=0.2742, ctc_loss=0.1951, cr_loss=0.3956, over 3367661.25 frames. ], batch size: 44, lr: 1.68e-02, grad_scale: 32.0 2024-09-22 23:02:15,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=122154.66666666667, ans=0.125 2024-09-22 23:02:27,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=122201.33333333333, ans=0.07 2024-09-22 23:03:04,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.91 vs. limit=10.0 2024-09-22 23:03:04,815 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-09-22 23:03:18,222 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.349e+02 1.457e+02 1.581e+02 2.828e+02, threshold=2.913e+02, percent-clipped=0.0 2024-09-22 23:03:33,473 INFO [train.py:1198] (3/4) Epoch 7, batch 2850, loss[loss=0.2807, ctc_loss=0.2003, cr_loss=0.4019, over 17317.00 frames. ], tot_loss[loss=0.2734, ctc_loss=0.1944, cr_loss=0.3947, over 3362844.73 frames. ], batch size: 51, lr: 1.67e-02, grad_scale: 32.0 2024-09-22 23:03:40,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=122388.0, ans=0.1 2024-09-22 23:04:12,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=22.5 2024-09-22 23:04:27,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=122528.0, ans=0.125 2024-09-22 23:04:39,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=122574.66666666667, ans=0.125 2024-09-22 23:04:55,304 INFO [train.py:1198] (3/4) Epoch 7, batch 2900, loss[loss=0.2645, ctc_loss=0.1931, cr_loss=0.3573, over 17031.00 frames. ], tot_loss[loss=0.2723, ctc_loss=0.1936, cr_loss=0.3932, over 3368402.88 frames. ], batch size: 39, lr: 1.67e-02, grad_scale: 16.0 2024-09-22 23:05:00,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=122621.33333333333, ans=0.125 2024-09-22 23:05:05,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=122621.33333333333, ans=0.0 2024-09-22 23:05:29,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=122714.66666666667, ans=0.125 2024-09-22 23:06:05,781 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.190e+02 1.397e+02 1.534e+02 1.877e+02 3.701e+02, threshold=3.067e+02, percent-clipped=2.0 2024-09-22 23:06:16,926 INFO [train.py:1198] (3/4) Epoch 7, batch 2950, loss[loss=0.2518, ctc_loss=0.1795, cr_loss=0.3614, over 17054.00 frames. ], tot_loss[loss=0.2723, ctc_loss=0.1935, cr_loss=0.394, over 3366416.12 frames. ], batch size: 39, lr: 1.67e-02, grad_scale: 16.0 2024-09-22 23:06:44,267 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.03 vs. limit=15.0 2024-09-22 23:07:02,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=122948.0, ans=0.025 2024-09-22 23:07:15,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=122994.66666666667, ans=10.0 2024-09-22 23:07:17,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.31 vs. limit=22.5 2024-09-22 23:07:20,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=122994.66666666667, ans=0.025 2024-09-22 23:07:29,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=123041.33333333333, ans=0.0 2024-09-22 23:07:29,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=123041.33333333333, ans=0.125 2024-09-22 23:07:38,727 INFO [train.py:1198] (3/4) Epoch 7, batch 3000, loss[loss=0.296, ctc_loss=0.2088, cr_loss=0.4359, over 17209.00 frames. ], tot_loss[loss=0.2741, ctc_loss=0.1949, cr_loss=0.3958, over 3359638.88 frames. ], batch size: 47, lr: 1.67e-02, grad_scale: 16.0 2024-09-22 23:07:38,727 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 23:07:54,139 INFO [train.py:1230] (3/4) Epoch 7, validation: loss=0.05688, ctc_loss=0.05688, cr_loss=7.669e-15, over 944034.00 frames. 2024-09-22 23:07:54,139 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-22 23:07:59,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=123088.0, ans=0.125 2024-09-22 23:08:01,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=15.0 2024-09-22 23:08:10,360 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-09-22 23:08:17,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=123134.66666666667, ans=0.125 2024-09-22 23:08:34,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.15 vs. limit=10.0 2024-09-22 23:08:51,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=123228.0, ans=0.0 2024-09-22 23:08:56,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=123274.66666666667, ans=0.0 2024-09-22 23:09:00,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=123274.66666666667, ans=0.125 2024-09-22 23:09:01,891 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.381e+02 1.504e+02 1.827e+02 5.428e+02, threshold=3.008e+02, percent-clipped=9.0 2024-09-22 23:09:06,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=123274.66666666667, ans=0.0 2024-09-22 23:09:12,400 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.92 vs. limit=15.0 2024-09-22 23:09:12,842 INFO [train.py:1198] (3/4) Epoch 7, batch 3050, loss[loss=0.3162, ctc_loss=0.2288, cr_loss=0.4371, over 16902.00 frames. ], tot_loss[loss=0.2742, ctc_loss=0.195, cr_loss=0.396, over 3358129.60 frames. ], batch size: 58, lr: 1.67e-02, grad_scale: 16.0 2024-09-22 23:09:14,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=123321.33333333333, ans=0.0 2024-09-22 23:09:27,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=123368.0, ans=0.04949747468305833 2024-09-22 23:09:39,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.36 vs. limit=22.5 2024-09-22 23:09:47,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=123414.66666666667, ans=0.1 2024-09-22 23:10:12,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.71 vs. limit=22.5 2024-09-22 23:10:16,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=123508.0, ans=0.125 2024-09-22 23:10:33,469 INFO [train.py:1198] (3/4) Epoch 7, batch 3100, loss[loss=0.2839, ctc_loss=0.2007, cr_loss=0.416, over 17156.00 frames. ], tot_loss[loss=0.2748, ctc_loss=0.1955, cr_loss=0.3969, over 3351377.53 frames. ], batch size: 45, lr: 1.67e-02, grad_scale: 16.0 2024-09-22 23:11:01,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=123601.33333333333, ans=0.125 2024-09-22 23:11:09,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=123648.0, ans=0.2 2024-09-22 23:11:17,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.84 vs. limit=10.0 2024-09-22 23:11:32,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=123694.66666666667, ans=0.0 2024-09-22 23:11:36,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=123741.33333333333, ans=0.125 2024-09-22 23:11:42,861 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.455e+02 1.593e+02 1.857e+02 3.038e+02, threshold=3.186e+02, percent-clipped=1.0 2024-09-22 23:11:53,644 INFO [train.py:1198] (3/4) Epoch 7, batch 3150, loss[loss=0.3087, ctc_loss=0.2195, cr_loss=0.4461, over 17099.00 frames. ], tot_loss[loss=0.2755, ctc_loss=0.196, cr_loss=0.3976, over 3353391.64 frames. ], batch size: 49, lr: 1.67e-02, grad_scale: 16.0 2024-09-22 23:12:03,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=123788.0, ans=0.125 2024-09-22 23:12:05,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=12.0 2024-09-22 23:12:35,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=123881.33333333333, ans=0.125 2024-09-22 23:12:54,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2024-09-22 23:13:12,053 INFO [train.py:1198] (3/4) Epoch 7, batch 3200, loss[loss=0.2549, ctc_loss=0.1838, cr_loss=0.3558, over 17309.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.196, cr_loss=0.3969, over 3347796.09 frames. ], batch size: 51, lr: 1.66e-02, grad_scale: 32.0 2024-09-22 23:13:18,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=124021.33333333333, ans=0.1 2024-09-22 23:13:38,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=124068.0, ans=0.025 2024-09-22 23:14:19,024 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.425e+02 1.561e+02 1.765e+02 3.696e+02, threshold=3.122e+02, percent-clipped=1.0 2024-09-22 23:14:29,939 INFO [train.py:1198] (3/4) Epoch 7, batch 3250, loss[loss=0.2698, ctc_loss=0.1907, cr_loss=0.3956, over 17028.00 frames. ], tot_loss[loss=0.2755, ctc_loss=0.1961, cr_loss=0.397, over 3337857.68 frames. ], batch size: 56, lr: 1.66e-02, grad_scale: 32.0 2024-09-22 23:14:54,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=124301.33333333333, ans=0.125 2024-09-22 23:15:45,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=124441.33333333333, ans=0.025 2024-09-22 23:15:49,579 INFO [train.py:1198] (3/4) Epoch 7, batch 3300, loss[loss=0.2893, ctc_loss=0.2053, cr_loss=0.4199, over 17021.00 frames. ], tot_loss[loss=0.2761, ctc_loss=0.1966, cr_loss=0.3975, over 3329119.50 frames. ], batch size: 51, lr: 1.66e-02, grad_scale: 32.0 2024-09-22 23:16:28,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-09-22 23:16:35,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=124581.33333333333, ans=0.125 2024-09-22 23:16:37,541 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=12.0 2024-09-22 23:16:49,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=124628.0, ans=0.0 2024-09-22 23:16:52,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=124674.66666666667, ans=0.025 2024-09-22 23:16:58,144 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.199e+02 1.418e+02 1.603e+02 1.871e+02 3.678e+02, threshold=3.206e+02, percent-clipped=3.0 2024-09-22 23:16:58,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=124674.66666666667, ans=0.125 2024-09-22 23:17:03,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=124674.66666666667, ans=0.1 2024-09-22 23:17:07,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=124721.33333333333, ans=0.125 2024-09-22 23:17:09,116 INFO [train.py:1198] (3/4) Epoch 7, batch 3350, loss[loss=0.2929, ctc_loss=0.2127, cr_loss=0.4013, over 16996.00 frames. ], tot_loss[loss=0.2752, ctc_loss=0.1957, cr_loss=0.3972, over 3341548.67 frames. ], batch size: 53, lr: 1.66e-02, grad_scale: 32.0 2024-09-22 23:17:34,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=124768.0, ans=0.0 2024-09-22 23:17:37,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=124768.0, ans=0.0 2024-09-22 23:17:46,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=124814.66666666667, ans=0.0 2024-09-22 23:18:11,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=124908.0, ans=0.025 2024-09-22 23:18:26,928 INFO [train.py:1198] (3/4) Epoch 7, batch 3400, loss[loss=0.2644, ctc_loss=0.1877, cr_loss=0.384, over 16982.00 frames. ], tot_loss[loss=0.2752, ctc_loss=0.1957, cr_loss=0.3974, over 3342337.97 frames. ], batch size: 42, lr: 1.66e-02, grad_scale: 32.0 2024-09-22 23:18:34,223 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2024-09-22 23:18:43,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=125001.33333333333, ans=0.035 2024-09-22 23:18:46,254 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 23:18:52,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=125001.33333333333, ans=0.125 2024-09-22 23:18:55,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=125001.33333333333, ans=0.125 2024-09-22 23:19:10,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=125048.0, ans=0.0 2024-09-22 23:19:10,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=125048.0, ans=0.0 2024-09-22 23:19:13,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.58 vs. limit=15.0 2024-09-22 23:19:27,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=125094.66666666667, ans=0.0 2024-09-22 23:19:34,687 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.374e+02 1.483e+02 1.750e+02 2.564e+02, threshold=2.966e+02, percent-clipped=0.0 2024-09-22 23:19:34,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=125141.33333333333, ans=0.0 2024-09-22 23:19:42,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=125141.33333333333, ans=0.0 2024-09-22 23:19:45,701 INFO [train.py:1198] (3/4) Epoch 7, batch 3450, loss[loss=0.2502, ctc_loss=0.1752, cr_loss=0.3751, over 17150.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1955, cr_loss=0.3972, over 3345388.79 frames. ], batch size: 45, lr: 1.66e-02, grad_scale: 32.0 2024-09-22 23:20:21,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=12.0 2024-09-22 23:20:39,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=125328.0, ans=0.125 2024-09-22 23:21:05,782 INFO [train.py:1198] (3/4) Epoch 7, batch 3500, loss[loss=0.2025, ctc_loss=0.1378, cr_loss=0.3238, over 17196.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1954, cr_loss=0.3971, over 3348217.67 frames. ], batch size: 41, lr: 1.66e-02, grad_scale: 32.0 2024-09-22 23:21:18,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=125421.33333333333, ans=0.1 2024-09-22 23:21:41,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-09-22 23:21:49,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=125514.66666666667, ans=0.0 2024-09-22 23:22:13,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=125608.0, ans=0.125 2024-09-22 23:22:14,394 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.203e+02 1.377e+02 1.536e+02 1.902e+02 3.624e+02, threshold=3.071e+02, percent-clipped=2.0 2024-09-22 23:22:25,094 INFO [train.py:1198] (3/4) Epoch 7, batch 3550, loss[loss=0.2638, ctc_loss=0.1845, cr_loss=0.3965, over 17306.00 frames. ], tot_loss[loss=0.2741, ctc_loss=0.1948, cr_loss=0.3965, over 3341324.66 frames. ], batch size: 46, lr: 1.65e-02, grad_scale: 32.0 2024-09-22 23:22:40,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=125701.33333333333, ans=0.0 2024-09-22 23:22:43,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=125701.33333333333, ans=0.2 2024-09-22 23:22:46,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=125701.33333333333, ans=0.015 2024-09-22 23:22:52,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=125701.33333333333, ans=0.0 2024-09-22 23:23:02,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=125748.0, ans=0.1 2024-09-22 23:23:02,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=125748.0, ans=0.0 2024-09-22 23:23:27,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=125841.33333333333, ans=0.05 2024-09-22 23:23:42,345 INFO [train.py:1198] (3/4) Epoch 7, batch 3600, loss[loss=0.2705, ctc_loss=0.1916, cr_loss=0.3944, over 17096.00 frames. ], tot_loss[loss=0.2742, ctc_loss=0.1948, cr_loss=0.3968, over 3355220.82 frames. ], batch size: 49, lr: 1.65e-02, grad_scale: 32.0 2024-09-22 23:23:42,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=125888.0, ans=0.0 2024-09-22 23:24:00,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=125934.66666666667, ans=0.125 2024-09-22 23:24:27,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=125981.33333333333, ans=0.0 2024-09-22 23:24:37,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=126028.0, ans=0.125 2024-09-22 23:24:45,348 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.05 vs. limit=6.0 2024-09-22 23:24:47,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=126074.66666666667, ans=0.2 2024-09-22 23:24:50,666 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.222e+02 1.445e+02 1.618e+02 1.998e+02 3.129e+02, threshold=3.236e+02, percent-clipped=1.0 2024-09-22 23:25:00,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=126121.33333333333, ans=0.1 2024-09-22 23:25:01,500 INFO [train.py:1198] (3/4) Epoch 7, batch 3650, loss[loss=0.3027, ctc_loss=0.2125, cr_loss=0.4511, over 17024.00 frames. ], tot_loss[loss=0.2748, ctc_loss=0.1952, cr_loss=0.3982, over 3358715.90 frames. ], batch size: 52, lr: 1.65e-02, grad_scale: 32.0 2024-09-22 23:25:21,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2024-09-22 23:25:30,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=126168.0, ans=0.0 2024-09-22 23:25:38,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=126214.66666666667, ans=0.125 2024-09-22 23:25:41,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2024-09-22 23:25:42,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.78 vs. limit=15.0 2024-09-22 23:25:43,893 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2024-09-22 23:25:51,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=126261.33333333333, ans=0.125 2024-09-22 23:26:21,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-22 23:26:22,288 INFO [train.py:1198] (3/4) Epoch 7, batch 3700, loss[loss=0.3214, ctc_loss=0.2322, cr_loss=0.4456, over 16905.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.1951, cr_loss=0.3972, over 3357566.07 frames. ], batch size: 58, lr: 1.65e-02, grad_scale: 32.0 2024-09-22 23:26:49,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=126401.33333333333, ans=0.2 2024-09-22 23:27:03,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=126448.0, ans=0.125 2024-09-22 23:27:22,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=126494.66666666667, ans=10.0 2024-09-22 23:27:30,063 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.388e+02 1.512e+02 1.895e+02 2.751e+02, threshold=3.024e+02, percent-clipped=0.0 2024-09-22 23:27:41,023 INFO [train.py:1198] (3/4) Epoch 7, batch 3750, loss[loss=0.2904, ctc_loss=0.2052, cr_loss=0.4262, over 16861.00 frames. ], tot_loss[loss=0.2748, ctc_loss=0.1953, cr_loss=0.3977, over 3359317.66 frames. ], batch size: 58, lr: 1.65e-02, grad_scale: 32.0 2024-09-22 23:28:09,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=126634.66666666667, ans=0.0 2024-09-22 23:28:45,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=126774.66666666667, ans=0.0 2024-09-22 23:28:59,019 INFO [train.py:1198] (3/4) Epoch 7, batch 3800, loss[loss=0.3148, ctc_loss=0.2302, cr_loss=0.4232, over 15259.00 frames. ], tot_loss[loss=0.2755, ctc_loss=0.1959, cr_loss=0.3978, over 3354408.48 frames. ], batch size: 89, lr: 1.65e-02, grad_scale: 32.0 2024-09-22 23:29:36,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.49 vs. limit=15.0 2024-09-22 23:30:02,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=127008.0, ans=0.2 2024-09-22 23:30:07,021 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.225e+02 1.460e+02 1.621e+02 1.904e+02 2.959e+02, threshold=3.241e+02, percent-clipped=0.0 2024-09-22 23:30:11,160 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=12.0 2024-09-22 23:30:17,776 INFO [train.py:1198] (3/4) Epoch 7, batch 3850, loss[loss=0.3078, ctc_loss=0.2208, cr_loss=0.4354, over 14931.00 frames. ], tot_loss[loss=0.2778, ctc_loss=0.1982, cr_loss=0.3981, over 3303038.61 frames. ], batch size: 89, lr: 1.65e-02, grad_scale: 32.0 2024-09-22 23:30:19,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=127054.66666666667, ans=0.025 2024-09-22 23:30:22,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=127054.66666666667, ans=0.125 2024-09-22 23:30:27,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=127054.66666666667, ans=0.125 2024-09-22 23:30:40,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=127101.33333333333, ans=0.125 2024-09-22 23:30:47,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=127148.0, ans=0.125 2024-09-22 23:31:04,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=127194.66666666667, ans=0.1 2024-09-22 23:31:24,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=127241.33333333333, ans=0.125 2024-09-22 23:32:20,223 INFO [train.py:1198] (3/4) Epoch 8, batch 0, loss[loss=0.2317, ctc_loss=0.1612, cr_loss=0.3524, over 17152.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1612, cr_loss=0.3524, over 17152.00 frames. ], batch size: 45, lr: 1.55e-02, grad_scale: 32.0 2024-09-22 23:32:20,223 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-22 23:32:35,550 INFO [train.py:1230] (3/4) Epoch 8, validation: loss=0.05692, ctc_loss=0.05692, cr_loss=7.316e-15, over 944034.00 frames. 2024-09-22 23:32:35,551 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-22 23:32:58,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=127316.0, ans=0.0 2024-09-22 23:33:30,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=127409.33333333333, ans=0.0 2024-09-22 23:33:41,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=127456.0, ans=0.2 2024-09-22 23:33:52,393 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.394e+02 1.752e+02 2.098e+02 6.301e+02, threshold=3.504e+02, percent-clipped=3.0 2024-09-22 23:33:55,629 INFO [train.py:1198] (3/4) Epoch 8, batch 50, loss[loss=0.2745, ctc_loss=0.1911, cr_loss=0.4168, over 17097.00 frames. ], tot_loss[loss=0.2717, ctc_loss=0.1926, cr_loss=0.3955, over 759971.58 frames. ], batch size: 49, lr: 1.55e-02, grad_scale: 32.0 2024-09-22 23:34:01,094 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2024-09-22 23:34:29,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=127596.0, ans=0.125 2024-09-22 23:34:34,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=127596.0, ans=0.025 2024-09-22 23:34:51,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2024-09-22 23:35:02,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=127689.33333333333, ans=0.2 2024-09-22 23:35:19,393 INFO [train.py:1198] (3/4) Epoch 8, batch 100, loss[loss=0.2322, ctc_loss=0.1615, cr_loss=0.3533, over 17057.00 frames. ], tot_loss[loss=0.2698, ctc_loss=0.1911, cr_loss=0.3935, over 1338117.88 frames. ], batch size: 39, lr: 1.55e-02, grad_scale: 32.0 2024-09-22 23:35:29,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2024-09-22 23:35:40,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=127782.66666666667, ans=0.07 2024-09-22 23:35:46,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=127782.66666666667, ans=0.1 2024-09-22 23:35:49,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=127829.33333333333, ans=0.125 2024-09-22 23:35:50,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=12.0 2024-09-22 23:35:50,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.37 vs. limit=22.5 2024-09-22 23:36:02,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.19 vs. limit=10.0 2024-09-22 23:36:20,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=127876.0, ans=0.0 2024-09-22 23:36:36,324 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 23:36:37,480 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.179e+02 1.356e+02 1.486e+02 1.737e+02 3.200e+02, threshold=2.973e+02, percent-clipped=0.0 2024-09-22 23:36:40,699 INFO [train.py:1198] (3/4) Epoch 8, batch 150, loss[loss=0.2769, ctc_loss=0.1929, cr_loss=0.4201, over 17297.00 frames. ], tot_loss[loss=0.2709, ctc_loss=0.1918, cr_loss=0.3959, over 1780316.92 frames. ], batch size: 46, lr: 1.55e-02, grad_scale: 32.0 2024-09-22 23:37:09,716 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 23:37:33,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=128109.33333333333, ans=0.0 2024-09-22 23:37:49,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=128156.0, ans=0.0 2024-09-22 23:37:49,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=128156.0, ans=0.0 2024-09-22 23:37:58,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=12.0 2024-09-22 23:38:02,291 INFO [train.py:1198] (3/4) Epoch 8, batch 200, loss[loss=0.3158, ctc_loss=0.2223, cr_loss=0.4677, over 17024.00 frames. ], tot_loss[loss=0.2736, ctc_loss=0.1939, cr_loss=0.3985, over 2125225.39 frames. ], batch size: 52, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:38:28,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=128249.33333333333, ans=0.2 2024-09-22 23:38:40,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=128296.0, ans=0.125 2024-09-22 23:38:55,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.71 vs. limit=10.0 2024-09-22 23:39:20,520 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.202e+02 1.362e+02 1.519e+02 1.695e+02 2.654e+02, threshold=3.037e+02, percent-clipped=0.0 2024-09-22 23:39:23,726 INFO [train.py:1198] (3/4) Epoch 8, batch 250, loss[loss=0.3619, ctc_loss=0.2707, cr_loss=0.4558, over 11674.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1944, cr_loss=0.3975, over 2386422.24 frames. ], batch size: 123, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:39:25,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=128436.0, ans=0.02 2024-09-22 23:39:46,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=128482.66666666667, ans=0.125 2024-09-22 23:39:48,194 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 23:39:52,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=128482.66666666667, ans=0.125 2024-09-22 23:40:02,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=128529.33333333333, ans=0.05 2024-09-22 23:40:10,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=128529.33333333333, ans=0.0 2024-09-22 23:40:46,251 INFO [train.py:1198] (3/4) Epoch 8, batch 300, loss[loss=0.2647, ctc_loss=0.1857, cr_loss=0.395, over 17233.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1945, cr_loss=0.3974, over 2604034.19 frames. ], batch size: 50, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:40:59,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=128669.33333333333, ans=0.5 2024-09-22 23:41:08,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=128716.0, ans=0.0 2024-09-22 23:41:10,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=128716.0, ans=0.0 2024-09-22 23:41:39,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=128809.33333333333, ans=0.125 2024-09-22 23:41:59,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=128856.0, ans=0.0 2024-09-22 23:42:07,483 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.437e+02 1.612e+02 1.809e+02 3.626e+02, threshold=3.224e+02, percent-clipped=3.0 2024-09-22 23:42:10,767 INFO [train.py:1198] (3/4) Epoch 8, batch 350, loss[loss=0.2374, ctc_loss=0.1665, cr_loss=0.3548, over 17186.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1954, cr_loss=0.3982, over 2765320.32 frames. ], batch size: 41, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:42:19,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=128902.66666666667, ans=0.125 2024-09-22 23:42:27,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=128949.33333333333, ans=0.0 2024-09-22 23:42:40,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.39 vs. limit=22.5 2024-09-22 23:42:43,016 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 23:43:03,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=129042.66666666667, ans=0.0 2024-09-22 23:43:14,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.23 vs. limit=15.0 2024-09-22 23:43:30,396 INFO [train.py:1198] (3/4) Epoch 8, batch 400, loss[loss=0.2447, ctc_loss=0.1724, cr_loss=0.3615, over 17194.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1943, cr_loss=0.3969, over 2901878.52 frames. ], batch size: 41, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:43:32,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2024-09-22 23:43:36,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=129136.0, ans=0.125 2024-09-22 23:43:43,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=129136.0, ans=0.125 2024-09-22 23:43:46,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=129182.66666666667, ans=0.1 2024-09-22 23:43:53,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=129182.66666666667, ans=0.1 2024-09-22 23:44:03,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=129229.33333333333, ans=0.025 2024-09-22 23:44:21,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=129276.0, ans=0.125 2024-09-22 23:44:22,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=129276.0, ans=0.0 2024-09-22 23:44:36,138 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=15.0 2024-09-22 23:44:49,603 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.185e+02 1.364e+02 1.519e+02 1.603e+02 2.870e+02, threshold=3.038e+02, percent-clipped=0.0 2024-09-22 23:44:52,859 INFO [train.py:1198] (3/4) Epoch 8, batch 450, loss[loss=0.2555, ctc_loss=0.1772, cr_loss=0.3918, over 17310.00 frames. ], tot_loss[loss=0.2729, ctc_loss=0.1937, cr_loss=0.3963, over 3004340.79 frames. ], batch size: 46, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:44:59,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=22.5 2024-09-22 23:46:06,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.03 vs. limit=10.0 2024-09-22 23:46:18,034 INFO [train.py:1198] (3/4) Epoch 8, batch 500, loss[loss=0.2826, ctc_loss=0.1991, cr_loss=0.4177, over 16733.00 frames. ], tot_loss[loss=0.2708, ctc_loss=0.192, cr_loss=0.3939, over 3088114.06 frames. ], batch size: 61, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:47:36,030 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.377e+02 1.524e+02 1.726e+02 2.894e+02, threshold=3.047e+02, percent-clipped=0.0 2024-09-22 23:47:39,314 INFO [train.py:1198] (3/4) Epoch 8, batch 550, loss[loss=0.2697, ctc_loss=0.194, cr_loss=0.3788, over 17068.00 frames. ], tot_loss[loss=0.27, ctc_loss=0.1914, cr_loss=0.393, over 3148335.78 frames. ], batch size: 46, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:47:42,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=129836.0, ans=0.1 2024-09-22 23:47:50,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=129836.0, ans=0.0 2024-09-22 23:48:00,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=129882.66666666667, ans=0.125 2024-09-22 23:48:08,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=129882.66666666667, ans=0.1 2024-09-22 23:48:36,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=129976.0, ans=0.125 2024-09-22 23:48:47,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=130022.66666666667, ans=0.125 2024-09-22 23:48:58,820 INFO [train.py:1198] (3/4) Epoch 8, batch 600, loss[loss=0.2972, ctc_loss=0.2105, cr_loss=0.4332, over 17061.00 frames. ], tot_loss[loss=0.2697, ctc_loss=0.1913, cr_loss=0.3924, over 3191444.30 frames. ], batch size: 46, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:50:04,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=130209.33333333333, ans=0.04949747468305833 2024-09-22 23:50:20,279 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.396e+02 1.531e+02 1.850e+02 5.586e+02, threshold=3.061e+02, percent-clipped=2.0 2024-09-22 23:50:23,506 INFO [train.py:1198] (3/4) Epoch 8, batch 650, loss[loss=0.2467, ctc_loss=0.1671, cr_loss=0.398, over 17301.00 frames. ], tot_loss[loss=0.2687, ctc_loss=0.1902, cr_loss=0.3923, over 3230850.17 frames. ], batch size: 46, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:50:27,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=130302.66666666667, ans=0.025 2024-09-22 23:50:43,802 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.79 vs. limit=10.0 2024-09-22 23:51:09,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=130396.0, ans=0.125 2024-09-22 23:51:19,573 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2024-09-22 23:51:30,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-09-22 23:51:38,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=130489.33333333333, ans=0.2 2024-09-22 23:51:41,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=130489.33333333333, ans=0.125 2024-09-22 23:51:44,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2024-09-22 23:51:48,929 INFO [train.py:1198] (3/4) Epoch 8, batch 700, loss[loss=0.2337, ctc_loss=0.1616, cr_loss=0.3605, over 17169.00 frames. ], tot_loss[loss=0.268, ctc_loss=0.1897, cr_loss=0.3914, over 3262101.59 frames. ], batch size: 41, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:52:37,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=130676.0, ans=0.1 2024-09-22 23:52:37,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=130676.0, ans=0.125 2024-09-22 23:52:51,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=130676.0, ans=0.125 2024-09-22 23:52:58,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=130722.66666666667, ans=0.025 2024-09-22 23:53:07,234 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.368e+02 1.561e+02 1.822e+02 3.141e+02, threshold=3.123e+02, percent-clipped=1.0 2024-09-22 23:53:10,437 INFO [train.py:1198] (3/4) Epoch 8, batch 750, loss[loss=0.3516, ctc_loss=0.2678, cr_loss=0.4188, over 11803.00 frames. ], tot_loss[loss=0.2683, ctc_loss=0.19, cr_loss=0.3916, over 3282081.45 frames. ], batch size: 125, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:53:12,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=130769.33333333333, ans=0.125 2024-09-22 23:53:17,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2024-09-22 23:53:26,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=130816.0, ans=0.125 2024-09-22 23:53:31,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2024-09-22 23:53:47,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=130862.66666666667, ans=10.0 2024-09-22 23:53:50,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=130862.66666666667, ans=0.125 2024-09-22 23:53:51,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2024-09-22 23:53:53,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=130862.66666666667, ans=0.0 2024-09-22 23:54:03,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=130909.33333333333, ans=0.2 2024-09-22 23:54:16,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.81 vs. limit=10.0 2024-09-22 23:54:33,329 INFO [train.py:1198] (3/4) Epoch 8, batch 800, loss[loss=0.3036, ctc_loss=0.2221, cr_loss=0.4076, over 16997.00 frames. ], tot_loss[loss=0.2677, ctc_loss=0.1895, cr_loss=0.3908, over 3293982.04 frames. ], batch size: 53, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:54:43,459 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.43 vs. limit=10.0 2024-09-22 23:54:44,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=131002.66666666667, ans=0.1 2024-09-22 23:55:09,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=131096.0, ans=0.025 2024-09-22 23:55:46,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2024-09-22 23:55:51,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=131189.33333333334, ans=0.125 2024-09-22 23:55:53,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=131189.33333333334, ans=0.0 2024-09-22 23:55:54,936 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.424e+02 1.641e+02 1.911e+02 2.980e+02, threshold=3.282e+02, percent-clipped=0.0 2024-09-22 23:55:58,177 INFO [train.py:1198] (3/4) Epoch 8, batch 850, loss[loss=0.2737, ctc_loss=0.1929, cr_loss=0.4041, over 17032.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1893, cr_loss=0.3905, over 3304918.26 frames. ], batch size: 51, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:56:16,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=131282.66666666666, ans=0.125 2024-09-22 23:56:27,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-09-22 23:56:30,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=131282.66666666666, ans=0.2 2024-09-22 23:56:41,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=131329.33333333334, ans=0.0 2024-09-22 23:57:21,043 INFO [train.py:1198] (3/4) Epoch 8, batch 900, loss[loss=0.2632, ctc_loss=0.1854, cr_loss=0.3891, over 17137.00 frames. ], tot_loss[loss=0.2683, ctc_loss=0.1899, cr_loss=0.3924, over 3318786.34 frames. ], batch size: 48, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:57:35,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=131516.0, ans=0.0 2024-09-22 23:57:48,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=131516.0, ans=0.0 2024-09-22 23:57:51,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=131562.66666666666, ans=0.0 2024-09-22 23:58:04,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131562.66666666666, ans=0.1 2024-09-22 23:58:14,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=131609.33333333334, ans=0.0 2024-09-22 23:58:24,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=131656.0, ans=0.0 2024-09-22 23:58:35,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=131656.0, ans=0.07 2024-09-22 23:58:37,889 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.417e+02 1.553e+02 1.784e+02 2.618e+02, threshold=3.106e+02, percent-clipped=0.0 2024-09-22 23:58:41,090 INFO [train.py:1198] (3/4) Epoch 8, batch 950, loss[loss=0.2627, ctc_loss=0.1875, cr_loss=0.3757, over 17303.00 frames. ], tot_loss[loss=0.2687, ctc_loss=0.1902, cr_loss=0.3922, over 3317809.54 frames. ], batch size: 46, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:59:09,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-09-22 23:59:11,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131749.33333333334, ans=0.1 2024-09-22 23:59:24,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0 2024-09-22 23:59:29,806 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 23:59:39,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2024-09-22 23:59:59,304 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 00:00:05,374 INFO [train.py:1198] (3/4) Epoch 8, batch 1000, loss[loss=0.2986, ctc_loss=0.212, cr_loss=0.4333, over 15898.00 frames. ], tot_loss[loss=0.2673, ctc_loss=0.189, cr_loss=0.3916, over 3337077.17 frames. ], batch size: 74, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:00:05,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131936.0, ans=0.1 2024-09-23 00:00:31,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=131982.66666666666, ans=0.2 2024-09-23 00:01:13,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=132122.66666666666, ans=0.125 2024-09-23 00:01:22,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=132122.66666666666, ans=0.125 2024-09-23 00:01:26,819 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.410e+02 1.544e+02 1.747e+02 2.558e+02, threshold=3.087e+02, percent-clipped=0.0 2024-09-23 00:01:29,993 INFO [train.py:1198] (3/4) Epoch 8, batch 1050, loss[loss=0.225, ctc_loss=0.1556, cr_loss=0.3471, over 17246.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1891, cr_loss=0.3916, over 3341562.28 frames. ], batch size: 42, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:01:36,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=132169.33333333334, ans=0.0 2024-09-23 00:01:50,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=132216.0, ans=0.0 2024-09-23 00:01:52,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=132216.0, ans=0.07 2024-09-23 00:02:03,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=132262.66666666666, ans=0.0 2024-09-23 00:02:13,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.18 vs. limit=22.5 2024-09-23 00:02:49,160 INFO [train.py:1198] (3/4) Epoch 8, batch 1100, loss[loss=0.2744, ctc_loss=0.1987, cr_loss=0.3784, over 17031.00 frames. ], tot_loss[loss=0.269, ctc_loss=0.1903, cr_loss=0.3932, over 3342640.98 frames. ], batch size: 56, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:03:05,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=132449.33333333334, ans=0.2 2024-09-23 00:03:08,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=132449.33333333334, ans=0.1 2024-09-23 00:03:46,075 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.83 vs. limit=15.0 2024-09-23 00:03:48,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=132542.66666666666, ans=0.0 2024-09-23 00:04:05,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=132589.33333333334, ans=0.125 2024-09-23 00:04:05,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=132589.33333333334, ans=0.0 2024-09-23 00:04:08,242 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.381e+02 1.495e+02 1.697e+02 2.213e+02, threshold=2.991e+02, percent-clipped=0.0 2024-09-23 00:04:11,437 INFO [train.py:1198] (3/4) Epoch 8, batch 1150, loss[loss=0.2503, ctc_loss=0.1789, cr_loss=0.3569, over 17020.00 frames. ], tot_loss[loss=0.2691, ctc_loss=0.1905, cr_loss=0.3933, over 3355282.42 frames. ], batch size: 51, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:04:21,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=132636.0, ans=0.2 2024-09-23 00:04:29,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=132682.66666666666, ans=0.04949747468305833 2024-09-23 00:04:29,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-09-23 00:04:34,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=132682.66666666666, ans=0.05 2024-09-23 00:05:30,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=132822.66666666666, ans=0.0 2024-09-23 00:05:33,700 INFO [train.py:1198] (3/4) Epoch 8, batch 1200, loss[loss=0.2697, ctc_loss=0.1938, cr_loss=0.3795, over 17304.00 frames. ], tot_loss[loss=0.2693, ctc_loss=0.1906, cr_loss=0.3934, over 3344359.13 frames. ], batch size: 51, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:05:34,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=132869.33333333334, ans=0.2 2024-09-23 00:05:42,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=132869.33333333334, ans=0.0 2024-09-23 00:06:03,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=132916.0, ans=0.125 2024-09-23 00:06:41,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=133056.0, ans=0.0 2024-09-23 00:06:57,397 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.195e+02 1.352e+02 1.465e+02 1.619e+02 2.564e+02, threshold=2.930e+02, percent-clipped=0.0 2024-09-23 00:06:58,993 INFO [train.py:1198] (3/4) Epoch 8, batch 1250, loss[loss=0.2656, ctc_loss=0.1844, cr_loss=0.4061, over 17211.00 frames. ], tot_loss[loss=0.2683, ctc_loss=0.1898, cr_loss=0.3925, over 3348122.28 frames. ], batch size: 50, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:07:07,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=133102.66666666666, ans=0.1 2024-09-23 00:07:23,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=133149.33333333334, ans=0.04949747468305833 2024-09-23 00:07:53,423 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 00:08:07,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=133289.33333333334, ans=0.125 2024-09-23 00:08:18,297 INFO [train.py:1198] (3/4) Epoch 8, batch 1300, loss[loss=0.2844, ctc_loss=0.2025, cr_loss=0.4095, over 17187.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1899, cr_loss=0.3932, over 3350119.66 frames. ], batch size: 55, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:09:38,133 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.229e+02 1.339e+02 1.481e+02 1.642e+02 2.207e+02, threshold=2.961e+02, percent-clipped=0.0 2024-09-23 00:09:39,799 INFO [train.py:1198] (3/4) Epoch 8, batch 1350, loss[loss=0.2527, ctc_loss=0.1756, cr_loss=0.3859, over 17218.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1895, cr_loss=0.3918, over 3331326.50 frames. ], batch size: 50, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:09:41,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=133569.33333333334, ans=0.025 2024-09-23 00:09:55,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=133569.33333333334, ans=0.125 2024-09-23 00:10:03,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=133616.0, ans=0.125 2024-09-23 00:10:19,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=133662.66666666666, ans=0.0 2024-09-23 00:10:45,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=133709.33333333334, ans=0.125 2024-09-23 00:11:06,823 INFO [train.py:1198] (3/4) Epoch 8, batch 1400, loss[loss=0.2579, ctc_loss=0.1824, cr_loss=0.3774, over 17280.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.1897, cr_loss=0.3923, over 3326493.65 frames. ], batch size: 42, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:11:19,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=133802.66666666666, ans=0.0 2024-09-23 00:11:40,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=133896.0, ans=0.1 2024-09-23 00:11:46,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=133896.0, ans=0.1 2024-09-23 00:11:51,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=133896.0, ans=0.05 2024-09-23 00:12:05,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=133942.66666666666, ans=0.1 2024-09-23 00:12:07,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=133942.66666666666, ans=0.125 2024-09-23 00:12:23,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=133989.33333333334, ans=0.2 2024-09-23 00:12:24,402 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.304e+02 1.414e+02 1.596e+02 2.535e+02, threshold=2.828e+02, percent-clipped=0.0 2024-09-23 00:12:25,993 INFO [train.py:1198] (3/4) Epoch 8, batch 1450, loss[loss=0.2466, ctc_loss=0.1709, cr_loss=0.3789, over 17314.00 frames. ], tot_loss[loss=0.2686, ctc_loss=0.19, cr_loss=0.393, over 3335929.15 frames. ], batch size: 46, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:13:48,012 INFO [train.py:1198] (3/4) Epoch 8, batch 1500, loss[loss=0.2517, ctc_loss=0.1758, cr_loss=0.3794, over 17272.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1898, cr_loss=0.3935, over 3341214.30 frames. ], batch size: 42, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:13:50,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=134269.33333333334, ans=0.125 2024-09-23 00:14:17,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=134316.0, ans=0.5 2024-09-23 00:14:58,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2024-09-23 00:15:09,201 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.316e+02 1.415e+02 1.558e+02 2.036e+02, threshold=2.830e+02, percent-clipped=0.0 2024-09-23 00:15:10,822 INFO [train.py:1198] (3/4) Epoch 8, batch 1550, loss[loss=0.2566, ctc_loss=0.1829, cr_loss=0.3684, over 17086.00 frames. ], tot_loss[loss=0.2678, ctc_loss=0.1893, cr_loss=0.3927, over 3346571.45 frames. ], batch size: 43, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:15:11,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=134502.66666666666, ans=0.0 2024-09-23 00:15:14,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=134502.66666666666, ans=0.125 2024-09-23 00:16:24,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=134689.33333333334, ans=0.04949747468305833 2024-09-23 00:16:35,828 INFO [train.py:1198] (3/4) Epoch 8, batch 1600, loss[loss=0.2697, ctc_loss=0.1859, cr_loss=0.4188, over 17057.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1889, cr_loss=0.3927, over 3349134.77 frames. ], batch size: 46, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:16:45,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=134736.0, ans=0.0 2024-09-23 00:16:50,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=134782.66666666666, ans=0.1 2024-09-23 00:16:56,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=134782.66666666666, ans=0.025 2024-09-23 00:17:04,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=134782.66666666666, ans=0.0 2024-09-23 00:17:06,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=134829.33333333334, ans=0.125 2024-09-23 00:17:07,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=134829.33333333334, ans=0.04949747468305833 2024-09-23 00:17:38,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=134922.66666666666, ans=0.0 2024-09-23 00:17:46,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=134922.66666666666, ans=0.0 2024-09-23 00:17:54,116 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.357e+02 1.515e+02 1.820e+02 2.997e+02, threshold=3.030e+02, percent-clipped=2.0 2024-09-23 00:17:55,799 INFO [train.py:1198] (3/4) Epoch 8, batch 1650, loss[loss=0.2362, ctc_loss=0.1656, cr_loss=0.3528, over 17017.00 frames. ], tot_loss[loss=0.2671, ctc_loss=0.1887, cr_loss=0.3923, over 3350903.75 frames. ], batch size: 44, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:17:59,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=134969.33333333334, ans=0.125 2024-09-23 00:18:08,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=134969.33333333334, ans=0.125 2024-09-23 00:18:13,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=135016.0, ans=0.0 2024-09-23 00:18:14,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=135016.0, ans=0.125 2024-09-23 00:18:33,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=12.0 2024-09-23 00:18:52,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=135109.33333333334, ans=0.0 2024-09-23 00:19:17,539 INFO [train.py:1198] (3/4) Epoch 8, batch 1700, loss[loss=0.2808, ctc_loss=0.2032, cr_loss=0.3879, over 16959.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1889, cr_loss=0.3918, over 3351571.68 frames. ], batch size: 58, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:19:36,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=135249.33333333334, ans=0.2 2024-09-23 00:20:32,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=135389.33333333334, ans=0.025 2024-09-23 00:20:34,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=135389.33333333334, ans=0.1 2024-09-23 00:20:40,292 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.382e+02 1.518e+02 1.704e+02 3.525e+02, threshold=3.037e+02, percent-clipped=1.0 2024-09-23 00:20:41,979 INFO [train.py:1198] (3/4) Epoch 8, batch 1750, loss[loss=0.2918, ctc_loss=0.205, cr_loss=0.4342, over 17291.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1887, cr_loss=0.3914, over 3348759.02 frames. ], batch size: 49, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:20:48,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=135436.0, ans=0.125 2024-09-23 00:20:51,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=135436.0, ans=0.0 2024-09-23 00:20:58,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=135482.66666666666, ans=0.1 2024-09-23 00:21:07,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=135482.66666666666, ans=0.125 2024-09-23 00:21:12,147 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-09-23 00:21:16,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=135529.33333333334, ans=0.07 2024-09-23 00:21:31,802 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.62 vs. limit=22.5 2024-09-23 00:21:45,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=135576.0, ans=0.07 2024-09-23 00:21:45,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.24 vs. limit=22.5 2024-09-23 00:21:59,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=135622.66666666666, ans=0.125 2024-09-23 00:22:03,313 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.20 vs. limit=22.5 2024-09-23 00:22:03,865 INFO [train.py:1198] (3/4) Epoch 8, batch 1800, loss[loss=0.2709, ctc_loss=0.1903, cr_loss=0.4031, over 17225.00 frames. ], tot_loss[loss=0.266, ctc_loss=0.188, cr_loss=0.3902, over 3345818.34 frames. ], batch size: 55, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:22:34,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=135762.66666666666, ans=0.125 2024-09-23 00:22:53,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=135809.33333333334, ans=0.0 2024-09-23 00:22:58,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=135809.33333333334, ans=0.125 2024-09-23 00:23:21,450 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.352e+02 1.514e+02 1.696e+02 3.020e+02, threshold=3.028e+02, percent-clipped=0.0 2024-09-23 00:23:23,113 INFO [train.py:1198] (3/4) Epoch 8, batch 1850, loss[loss=0.2639, ctc_loss=0.1853, cr_loss=0.3927, over 17011.00 frames. ], tot_loss[loss=0.266, ctc_loss=0.188, cr_loss=0.3903, over 3346516.71 frames. ], batch size: 53, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:23:44,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.74 vs. limit=15.0 2024-09-23 00:23:47,584 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-09-23 00:23:48,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=135949.33333333334, ans=0.125 2024-09-23 00:24:02,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=135996.0, ans=0.0 2024-09-23 00:24:20,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=136042.66666666666, ans=0.125 2024-09-23 00:24:33,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.26 vs. limit=15.0 2024-09-23 00:24:34,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=136089.33333333334, ans=0.125 2024-09-23 00:24:39,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=136089.33333333334, ans=0.125 2024-09-23 00:24:47,782 INFO [train.py:1198] (3/4) Epoch 8, batch 1900, loss[loss=0.2777, ctc_loss=0.1998, cr_loss=0.3895, over 16897.00 frames. ], tot_loss[loss=0.2662, ctc_loss=0.188, cr_loss=0.391, over 3342984.49 frames. ], batch size: 58, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:25:11,054 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2024-09-23 00:25:18,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=136229.33333333334, ans=0.125 2024-09-23 00:25:31,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2024-09-23 00:25:57,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.52 vs. limit=12.0 2024-09-23 00:26:01,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=136322.66666666666, ans=0.125 2024-09-23 00:26:11,059 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.199e+02 1.376e+02 1.564e+02 1.775e+02 2.559e+02, threshold=3.128e+02, percent-clipped=0.0 2024-09-23 00:26:12,648 INFO [train.py:1198] (3/4) Epoch 8, batch 1950, loss[loss=0.2549, ctc_loss=0.1742, cr_loss=0.4038, over 17308.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1887, cr_loss=0.3926, over 3353229.33 frames. ], batch size: 46, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:26:14,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=136369.33333333334, ans=0.2 2024-09-23 00:26:23,100 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-09-23 00:26:28,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136416.0, ans=0.125 2024-09-23 00:26:46,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=136462.66666666666, ans=0.125 2024-09-23 00:27:10,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=136509.33333333334, ans=0.0 2024-09-23 00:27:32,303 INFO [train.py:1198] (3/4) Epoch 8, batch 2000, loss[loss=0.2385, ctc_loss=0.1666, cr_loss=0.3593, over 17295.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1894, cr_loss=0.3926, over 3355991.34 frames. ], batch size: 42, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:27:51,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=136649.33333333334, ans=0.2 2024-09-23 00:27:52,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=136649.33333333334, ans=0.0 2024-09-23 00:28:01,820 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 00:28:01,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=136649.33333333334, ans=0.125 2024-09-23 00:28:22,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136742.66666666666, ans=0.1 2024-09-23 00:28:33,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=136742.66666666666, ans=0.125 2024-09-23 00:28:53,824 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.381e+02 1.478e+02 1.758e+02 2.844e+02, threshold=2.957e+02, percent-clipped=0.0 2024-09-23 00:28:55,423 INFO [train.py:1198] (3/4) Epoch 8, batch 2050, loss[loss=0.2949, ctc_loss=0.2069, cr_loss=0.4399, over 17057.00 frames. ], tot_loss[loss=0.2669, ctc_loss=0.1887, cr_loss=0.3912, over 3358972.82 frames. ], batch size: 56, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:29:11,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136882.66666666666, ans=0.125 2024-09-23 00:29:16,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=136882.66666666666, ans=0.125 2024-09-23 00:29:45,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=136976.0, ans=0.125 2024-09-23 00:29:58,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=136976.0, ans=0.025 2024-09-23 00:29:58,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=136976.0, ans=0.0 2024-09-23 00:30:16,955 INFO [train.py:1198] (3/4) Epoch 8, batch 2100, loss[loss=0.2224, ctc_loss=0.1577, cr_loss=0.3236, over 17277.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1886, cr_loss=0.3918, over 3363603.51 frames. ], batch size: 42, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:30:23,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=137069.33333333334, ans=0.04949747468305833 2024-09-23 00:31:21,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=137209.33333333334, ans=0.125 2024-09-23 00:31:40,396 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.375e+02 1.520e+02 1.695e+02 2.810e+02, threshold=3.041e+02, percent-clipped=0.0 2024-09-23 00:31:41,917 INFO [train.py:1198] (3/4) Epoch 8, batch 2150, loss[loss=0.2788, ctc_loss=0.2042, cr_loss=0.373, over 16479.00 frames. ], tot_loss[loss=0.2648, ctc_loss=0.1869, cr_loss=0.3895, over 3365720.19 frames. ], batch size: 66, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:32:00,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=12.0 2024-09-23 00:32:24,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2024-09-23 00:32:30,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=137442.66666666666, ans=0.125 2024-09-23 00:32:48,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2024-09-23 00:32:56,382 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.15 vs. limit=10.0 2024-09-23 00:33:01,806 INFO [train.py:1198] (3/4) Epoch 8, batch 2200, loss[loss=0.2826, ctc_loss=0.2026, cr_loss=0.3998, over 17152.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.1868, cr_loss=0.3889, over 3371738.86 frames. ], batch size: 48, lr: 1.49e-02, grad_scale: 32.0 2024-09-23 00:33:28,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=137582.66666666666, ans=0.0 2024-09-23 00:33:39,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=137629.33333333334, ans=0.1 2024-09-23 00:33:53,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=137676.0, ans=0.125 2024-09-23 00:34:00,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=137676.0, ans=0.125 2024-09-23 00:34:03,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=137676.0, ans=0.2 2024-09-23 00:34:10,715 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2024-09-23 00:34:17,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=137722.66666666666, ans=0.025 2024-09-23 00:34:21,856 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.429e+02 1.621e+02 1.891e+02 2.462e+02, threshold=3.242e+02, percent-clipped=0.0 2024-09-23 00:34:22,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=137769.33333333334, ans=0.09899494936611666 2024-09-23 00:34:23,499 INFO [train.py:1198] (3/4) Epoch 8, batch 2250, loss[loss=0.2716, ctc_loss=0.1911, cr_loss=0.4027, over 17303.00 frames. ], tot_loss[loss=0.2644, ctc_loss=0.1866, cr_loss=0.389, over 3371545.85 frames. ], batch size: 49, lr: 1.49e-02, grad_scale: 32.0 2024-09-23 00:34:39,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=137769.33333333334, ans=0.125 2024-09-23 00:34:40,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=137816.0, ans=0.1 2024-09-23 00:34:56,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=137862.66666666666, ans=0.125 2024-09-23 00:35:08,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=137862.66666666666, ans=0.0 2024-09-23 00:35:37,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=137956.0, ans=0.125 2024-09-23 00:35:50,813 INFO [train.py:1198] (3/4) Epoch 8, batch 2300, loss[loss=0.293, ctc_loss=0.2085, cr_loss=0.4225, over 16648.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.189, cr_loss=0.3917, over 3362167.44 frames. ], batch size: 66, lr: 1.49e-02, grad_scale: 32.0 2024-09-23 00:35:59,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=138002.66666666666, ans=0.2 2024-09-23 00:36:07,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=138049.33333333334, ans=0.125 2024-09-23 00:36:08,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=138049.33333333334, ans=0.2 2024-09-23 00:36:16,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=138049.33333333334, ans=0.2 2024-09-23 00:36:23,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=138096.0, ans=0.1 2024-09-23 00:36:37,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=138142.66666666666, ans=0.125 2024-09-23 00:36:45,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=138142.66666666666, ans=0.125 2024-09-23 00:37:09,110 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.383e+02 1.498e+02 1.682e+02 2.493e+02, threshold=2.995e+02, percent-clipped=0.0 2024-09-23 00:37:10,796 INFO [train.py:1198] (3/4) Epoch 8, batch 2350, loss[loss=0.2386, ctc_loss=0.164, cr_loss=0.3733, over 15954.00 frames. ], tot_loss[loss=0.2673, ctc_loss=0.1888, cr_loss=0.3927, over 3361260.03 frames. ], batch size: 35, lr: 1.49e-02, grad_scale: 32.0 2024-09-23 00:37:11,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=138236.0, ans=0.1 2024-09-23 00:37:17,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=138236.0, ans=0.025 2024-09-23 00:37:55,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=138329.33333333334, ans=0.125 2024-09-23 00:37:59,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=138376.0, ans=0.1 2024-09-23 00:38:00,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=138376.0, ans=0.2 2024-09-23 00:38:10,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=138376.0, ans=0.0 2024-09-23 00:38:33,072 INFO [train.py:1198] (3/4) Epoch 8, batch 2400, loss[loss=0.2319, ctc_loss=0.1612, cr_loss=0.3534, over 17075.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.189, cr_loss=0.3921, over 3351811.95 frames. ], batch size: 43, lr: 1.49e-02, grad_scale: 32.0 2024-09-23 00:38:36,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=138469.33333333334, ans=0.2 2024-09-23 00:38:46,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=138469.33333333334, ans=0.0 2024-09-23 00:39:11,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=138562.66666666666, ans=0.5 2024-09-23 00:39:13,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=138562.66666666666, ans=0.0 2024-09-23 00:39:29,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=138609.33333333334, ans=0.1 2024-09-23 00:39:41,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=138656.0, ans=0.125 2024-09-23 00:39:53,487 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.369e+02 1.492e+02 1.774e+02 2.629e+02, threshold=2.985e+02, percent-clipped=0.0 2024-09-23 00:39:55,153 INFO [train.py:1198] (3/4) Epoch 8, batch 2450, loss[loss=0.2845, ctc_loss=0.203, cr_loss=0.4075, over 17233.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1886, cr_loss=0.3927, over 3362088.37 frames. ], batch size: 55, lr: 1.49e-02, grad_scale: 32.0 2024-09-23 00:39:55,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.20 vs. limit=10.0 2024-09-23 00:40:06,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=138702.66666666666, ans=0.0 2024-09-23 00:40:23,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=138749.33333333334, ans=0.125 2024-09-23 00:40:43,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=138796.0, ans=0.125 2024-09-23 00:40:50,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2024-09-23 00:41:20,431 INFO [train.py:1198] (3/4) Epoch 8, batch 2500, loss[loss=0.2753, ctc_loss=0.1949, cr_loss=0.4017, over 17114.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1889, cr_loss=0.3929, over 3363245.77 frames. ], batch size: 49, lr: 1.49e-02, grad_scale: 32.0 2024-09-23 00:41:33,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=138936.0, ans=0.0 2024-09-23 00:41:52,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=139029.33333333334, ans=0.125 2024-09-23 00:41:59,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139029.33333333334, ans=0.1 2024-09-23 00:42:21,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=139076.0, ans=0.025 2024-09-23 00:42:40,014 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.309e+02 1.454e+02 1.673e+02 2.570e+02, threshold=2.909e+02, percent-clipped=0.0 2024-09-23 00:42:40,039 INFO [train.py:1198] (3/4) Epoch 8, batch 2550, loss[loss=0.314, ctc_loss=0.2246, cr_loss=0.4466, over 16879.00 frames. ], tot_loss[loss=0.2682, ctc_loss=0.1894, cr_loss=0.394, over 3369314.02 frames. ], batch size: 58, lr: 1.49e-02, grad_scale: 16.0 2024-09-23 00:42:40,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=139169.33333333334, ans=0.125 2024-09-23 00:42:45,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=139169.33333333334, ans=0.035 2024-09-23 00:43:04,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=139216.0, ans=0.0 2024-09-23 00:43:12,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=139262.66666666666, ans=0.0 2024-09-23 00:43:19,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=139262.66666666666, ans=0.0 2024-09-23 00:43:20,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.46 vs. limit=6.0 2024-09-23 00:43:37,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=139309.33333333334, ans=0.125 2024-09-23 00:44:02,587 INFO [train.py:1198] (3/4) Epoch 8, batch 2600, loss[loss=0.2404, ctc_loss=0.1655, cr_loss=0.3743, over 17096.00 frames. ], tot_loss[loss=0.2671, ctc_loss=0.1886, cr_loss=0.3927, over 3369553.99 frames. ], batch size: 49, lr: 1.48e-02, grad_scale: 16.0 2024-09-23 00:44:09,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=139402.66666666666, ans=0.07 2024-09-23 00:44:21,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=139449.33333333334, ans=0.0 2024-09-23 00:44:23,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=139449.33333333334, ans=0.125 2024-09-23 00:45:26,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=15.0 2024-09-23 00:45:27,488 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.349e+02 1.476e+02 1.822e+02 2.637e+02, threshold=2.952e+02, percent-clipped=0.0 2024-09-23 00:45:27,513 INFO [train.py:1198] (3/4) Epoch 8, batch 2650, loss[loss=0.2608, ctc_loss=0.185, cr_loss=0.3789, over 17015.00 frames. ], tot_loss[loss=0.2671, ctc_loss=0.1886, cr_loss=0.3925, over 3361300.05 frames. ], batch size: 44, lr: 1.48e-02, grad_scale: 16.0 2024-09-23 00:46:48,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2024-09-23 00:46:49,530 INFO [train.py:1198] (3/4) Epoch 8, batch 2700, loss[loss=0.2757, ctc_loss=0.1968, cr_loss=0.3945, over 17345.00 frames. ], tot_loss[loss=0.2653, ctc_loss=0.1871, cr_loss=0.391, over 3369371.29 frames. ], batch size: 48, lr: 1.48e-02, grad_scale: 16.0 2024-09-23 00:47:23,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=139962.66666666666, ans=0.025 2024-09-23 00:48:11,967 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.374e+02 1.576e+02 1.752e+02 3.056e+02, threshold=3.153e+02, percent-clipped=1.0 2024-09-23 00:48:11,992 INFO [train.py:1198] (3/4) Epoch 8, batch 2750, loss[loss=0.258, ctc_loss=0.1776, cr_loss=0.4022, over 17200.00 frames. ], tot_loss[loss=0.2653, ctc_loss=0.1872, cr_loss=0.3908, over 3367446.38 frames. ], batch size: 41, lr: 1.48e-02, grad_scale: 16.0 2024-09-23 00:48:15,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=140102.66666666666, ans=0.2 2024-09-23 00:48:29,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=140149.33333333334, ans=0.0 2024-09-23 00:48:42,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=140196.0, ans=0.125 2024-09-23 00:48:44,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=140196.0, ans=0.025 2024-09-23 00:48:45,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=140196.0, ans=0.125 2024-09-23 00:48:47,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=140196.0, ans=0.0 2024-09-23 00:49:22,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=140289.33333333334, ans=0.04949747468305833 2024-09-23 00:49:34,169 INFO [train.py:1198] (3/4) Epoch 8, batch 2800, loss[loss=0.2973, ctc_loss=0.2105, cr_loss=0.4341, over 16605.00 frames. ], tot_loss[loss=0.2652, ctc_loss=0.1871, cr_loss=0.3905, over 3364971.22 frames. ], batch size: 66, lr: 1.48e-02, grad_scale: 32.0 2024-09-23 00:49:55,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=140382.66666666666, ans=0.2 2024-09-23 00:50:58,773 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.406e+02 1.667e+02 1.997e+02 3.785e+02, threshold=3.334e+02, percent-clipped=1.0 2024-09-23 00:50:58,798 INFO [train.py:1198] (3/4) Epoch 8, batch 2850, loss[loss=0.2466, ctc_loss=0.1698, cr_loss=0.3836, over 17024.00 frames. ], tot_loss[loss=0.2658, ctc_loss=0.1877, cr_loss=0.3903, over 3346727.99 frames. ], batch size: 44, lr: 1.48e-02, grad_scale: 32.0 2024-09-23 00:50:59,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=140569.33333333334, ans=0.125 2024-09-23 00:51:19,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=140616.0, ans=0.0 2024-09-23 00:51:50,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140709.33333333334, ans=0.1 2024-09-23 00:51:54,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=140709.33333333334, ans=10.0 2024-09-23 00:52:18,843 INFO [train.py:1198] (3/4) Epoch 8, batch 2900, loss[loss=0.2425, ctc_loss=0.1685, cr_loss=0.3702, over 17175.00 frames. ], tot_loss[loss=0.2634, ctc_loss=0.1859, cr_loss=0.3874, over 3352870.80 frames. ], batch size: 45, lr: 1.48e-02, grad_scale: 32.0 2024-09-23 00:52:28,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=140802.66666666666, ans=0.125 2024-09-23 00:52:40,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=140849.33333333334, ans=0.1 2024-09-23 00:52:41,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=140849.33333333334, ans=0.0 2024-09-23 00:52:41,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=140849.33333333334, ans=0.125 2024-09-23 00:52:46,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=140849.33333333334, ans=0.125 2024-09-23 00:52:52,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140896.0, ans=0.1 2024-09-23 00:53:13,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=140942.66666666666, ans=0.125 2024-09-23 00:53:37,227 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.11 vs. limit=10.0 2024-09-23 00:53:41,170 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.334e+02 1.488e+02 1.595e+02 2.196e+02, threshold=2.977e+02, percent-clipped=0.0 2024-09-23 00:53:41,195 INFO [train.py:1198] (3/4) Epoch 8, batch 2950, loss[loss=0.246, ctc_loss=0.1745, cr_loss=0.3576, over 16950.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.1856, cr_loss=0.387, over 3354187.74 frames. ], batch size: 42, lr: 1.48e-02, grad_scale: 32.0 2024-09-23 00:53:43,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141036.0, ans=0.1 2024-09-23 00:54:44,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=141176.0, ans=0.1 2024-09-23 00:55:02,758 INFO [train.py:1198] (3/4) Epoch 8, batch 3000, loss[loss=0.2215, ctc_loss=0.1547, cr_loss=0.334, over 17248.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1857, cr_loss=0.3876, over 3357991.47 frames. ], batch size: 44, lr: 1.48e-02, grad_scale: 32.0 2024-09-23 00:55:02,758 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 00:55:18,807 INFO [train.py:1230] (3/4) Epoch 8, validation: loss=0.05304, ctc_loss=0.05304, cr_loss=7.247e-15, over 944034.00 frames. 2024-09-23 00:55:18,808 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 00:55:36,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=141316.0, ans=0.125 2024-09-23 00:56:02,278 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=22.5 2024-09-23 00:56:32,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=141456.0, ans=0.0 2024-09-23 00:56:37,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141456.0, ans=0.1 2024-09-23 00:56:39,936 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.352e+02 1.499e+02 1.786e+02 3.736e+02, threshold=2.997e+02, percent-clipped=4.0 2024-09-23 00:56:39,962 INFO [train.py:1198] (3/4) Epoch 8, batch 3050, loss[loss=0.2544, ctc_loss=0.1766, cr_loss=0.389, over 17060.00 frames. ], tot_loss[loss=0.2623, ctc_loss=0.1849, cr_loss=0.3868, over 3361011.81 frames. ], batch size: 46, lr: 1.47e-02, grad_scale: 32.0 2024-09-23 00:57:16,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=141596.0, ans=0.0 2024-09-23 00:57:41,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=141689.33333333334, ans=0.125 2024-09-23 00:57:44,015 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2024-09-23 00:57:44,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=22.5 2024-09-23 00:57:48,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=141689.33333333334, ans=0.09899494936611666 2024-09-23 00:57:49,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=141689.33333333334, ans=0.2 2024-09-23 00:57:58,667 INFO [train.py:1198] (3/4) Epoch 8, batch 3100, loss[loss=0.2417, ctc_loss=0.1712, cr_loss=0.3527, over 16960.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1856, cr_loss=0.3878, over 3362808.27 frames. ], batch size: 42, lr: 1.47e-02, grad_scale: 16.0 2024-09-23 00:58:02,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141736.0, ans=0.1 2024-09-23 00:59:02,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=141922.66666666666, ans=0.04949747468305833 2024-09-23 00:59:16,305 INFO [train.py:1198] (3/4) Epoch 8, batch 3150, loss[loss=0.2304, ctc_loss=0.1608, cr_loss=0.3476, over 17015.00 frames. ], tot_loss[loss=0.2642, ctc_loss=0.1863, cr_loss=0.3894, over 3363080.08 frames. ], batch size: 39, lr: 1.47e-02, grad_scale: 16.0 2024-09-23 00:59:17,877 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.325e+02 1.474e+02 1.672e+02 3.223e+02, threshold=2.948e+02, percent-clipped=1.0 2024-09-23 00:59:27,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141969.33333333334, ans=0.1 2024-09-23 00:59:29,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=141969.33333333334, ans=0.125 2024-09-23 00:59:29,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=141969.33333333334, ans=0.125 2024-09-23 00:59:30,882 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 00:59:44,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=142016.0, ans=0.2 2024-09-23 00:59:52,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=142062.66666666666, ans=0.125 2024-09-23 00:59:55,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=142062.66666666666, ans=0.2 2024-09-23 01:00:19,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=142156.0, ans=0.07 2024-09-23 01:00:32,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.49 vs. limit=10.0 2024-09-23 01:00:34,387 INFO [train.py:1198] (3/4) Epoch 8, batch 3200, loss[loss=0.2155, ctc_loss=0.1507, cr_loss=0.3242, over 16274.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1855, cr_loss=0.3884, over 3353360.96 frames. ], batch size: 36, lr: 1.47e-02, grad_scale: 32.0 2024-09-23 01:00:48,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=142249.33333333334, ans=0.0 2024-09-23 01:00:53,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=142249.33333333334, ans=0.2 2024-09-23 01:01:27,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=142342.66666666666, ans=0.0 2024-09-23 01:01:37,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=142389.33333333334, ans=0.125 2024-09-23 01:01:53,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=142436.0, ans=0.125 2024-09-23 01:01:54,766 INFO [train.py:1198] (3/4) Epoch 8, batch 3250, loss[loss=0.2852, ctc_loss=0.2036, cr_loss=0.4081, over 17023.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1862, cr_loss=0.3892, over 3348961.97 frames. ], batch size: 56, lr: 1.47e-02, grad_scale: 32.0 2024-09-23 01:01:56,370 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.501e+02 1.666e+02 1.949e+02 2.835e+02, threshold=3.332e+02, percent-clipped=0.0 2024-09-23 01:01:56,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=142436.0, ans=0.125 2024-09-23 01:02:01,336 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 01:02:35,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=142529.33333333334, ans=0.0 2024-09-23 01:03:12,273 INFO [train.py:1198] (3/4) Epoch 8, batch 3300, loss[loss=0.2506, ctc_loss=0.1796, cr_loss=0.3546, over 17216.00 frames. ], tot_loss[loss=0.264, ctc_loss=0.1862, cr_loss=0.3891, over 3352397.34 frames. ], batch size: 50, lr: 1.47e-02, grad_scale: 32.0 2024-09-23 01:03:43,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=142762.66666666666, ans=0.0 2024-09-23 01:04:05,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=142809.33333333334, ans=0.125 2024-09-23 01:04:30,463 INFO [train.py:1198] (3/4) Epoch 8, batch 3350, loss[loss=0.2129, ctc_loss=0.1439, cr_loss=0.3448, over 16972.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.1868, cr_loss=0.389, over 3340207.95 frames. ], batch size: 42, lr: 1.47e-02, grad_scale: 16.0 2024-09-23 01:04:30,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=142902.66666666666, ans=0.0 2024-09-23 01:04:33,559 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.197e+02 1.393e+02 1.569e+02 1.774e+02 3.394e+02, threshold=3.137e+02, percent-clipped=1.0 2024-09-23 01:04:42,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2024-09-23 01:04:43,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.87 vs. limit=15.0 2024-09-23 01:04:45,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=142949.33333333334, ans=0.1 2024-09-23 01:04:47,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=142949.33333333334, ans=0.2 2024-09-23 01:05:00,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=142996.0, ans=0.125 2024-09-23 01:05:03,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-23 01:05:05,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=142996.0, ans=0.0 2024-09-23 01:05:52,731 INFO [train.py:1198] (3/4) Epoch 8, batch 3400, loss[loss=0.2324, ctc_loss=0.1624, cr_loss=0.3497, over 17182.00 frames. ], tot_loss[loss=0.2648, ctc_loss=0.1869, cr_loss=0.3897, over 3341610.61 frames. ], batch size: 41, lr: 1.47e-02, grad_scale: 16.0 2024-09-23 01:06:06,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=143182.66666666666, ans=0.2 2024-09-23 01:06:25,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=12.0 2024-09-23 01:06:33,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=143229.33333333334, ans=0.0 2024-09-23 01:06:36,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=143229.33333333334, ans=0.0 2024-09-23 01:06:49,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=143276.0, ans=0.95 2024-09-23 01:06:52,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=143276.0, ans=0.025 2024-09-23 01:07:12,539 INFO [train.py:1198] (3/4) Epoch 8, batch 3450, loss[loss=0.2457, ctc_loss=0.175, cr_loss=0.3535, over 16752.00 frames. ], tot_loss[loss=0.265, ctc_loss=0.1869, cr_loss=0.3904, over 3346457.64 frames. ], batch size: 61, lr: 1.47e-02, grad_scale: 16.0 2024-09-23 01:07:13,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2024-09-23 01:07:15,656 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.386e+02 1.521e+02 1.778e+02 2.541e+02, threshold=3.041e+02, percent-clipped=0.0 2024-09-23 01:07:30,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=143416.0, ans=0.2 2024-09-23 01:08:04,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=143509.33333333334, ans=0.125 2024-09-23 01:08:30,382 INFO [train.py:1198] (3/4) Epoch 8, batch 3500, loss[loss=0.2652, ctc_loss=0.1858, cr_loss=0.3972, over 17146.00 frames. ], tot_loss[loss=0.2657, ctc_loss=0.1874, cr_loss=0.3916, over 3353515.28 frames. ], batch size: 48, lr: 1.46e-02, grad_scale: 16.0 2024-09-23 01:08:41,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=143602.66666666666, ans=0.0 2024-09-23 01:08:44,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=143649.33333333334, ans=0.0 2024-09-23 01:09:44,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=143789.33333333334, ans=0.125 2024-09-23 01:09:48,669 INFO [train.py:1198] (3/4) Epoch 8, batch 3550, loss[loss=0.2847, ctc_loss=0.1986, cr_loss=0.4304, over 17232.00 frames. ], tot_loss[loss=0.2656, ctc_loss=0.1873, cr_loss=0.3915, over 3354260.53 frames. ], batch size: 50, lr: 1.46e-02, grad_scale: 16.0 2024-09-23 01:09:48,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=143836.0, ans=0.125 2024-09-23 01:09:51,761 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.159e+02 1.323e+02 1.423e+02 1.598e+02 2.580e+02, threshold=2.846e+02, percent-clipped=0.0 2024-09-23 01:10:04,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=143882.66666666666, ans=0.015 2024-09-23 01:10:37,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=143976.0, ans=0.09899494936611666 2024-09-23 01:10:57,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=144022.66666666666, ans=0.125 2024-09-23 01:11:06,359 INFO [train.py:1198] (3/4) Epoch 8, batch 3600, loss[loss=0.2555, ctc_loss=0.18, cr_loss=0.3778, over 17012.00 frames. ], tot_loss[loss=0.2663, ctc_loss=0.1878, cr_loss=0.3925, over 3353272.94 frames. ], batch size: 44, lr: 1.46e-02, grad_scale: 32.0 2024-09-23 01:11:11,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=144069.33333333334, ans=0.125 2024-09-23 01:12:10,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=144256.0, ans=0.0 2024-09-23 01:12:27,279 INFO [train.py:1198] (3/4) Epoch 8, batch 3650, loss[loss=0.2407, ctc_loss=0.1702, cr_loss=0.3525, over 17046.00 frames. ], tot_loss[loss=0.2667, ctc_loss=0.1881, cr_loss=0.3933, over 3358864.39 frames. ], batch size: 39, lr: 1.46e-02, grad_scale: 32.0 2024-09-23 01:12:30,480 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.372e+02 1.519e+02 1.739e+02 2.361e+02, threshold=3.037e+02, percent-clipped=0.0 2024-09-23 01:12:35,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=144302.66666666666, ans=0.1 2024-09-23 01:12:36,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=144302.66666666666, ans=0.1 2024-09-23 01:12:38,552 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 01:12:48,000 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 01:13:02,554 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=15.0 2024-09-23 01:13:03,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2024-09-23 01:13:42,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2024-09-23 01:13:45,450 INFO [train.py:1198] (3/4) Epoch 8, batch 3700, loss[loss=0.2847, ctc_loss=0.2012, cr_loss=0.4176, over 16749.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.1881, cr_loss=0.3935, over 3359920.04 frames. ], batch size: 61, lr: 1.46e-02, grad_scale: 32.0 2024-09-23 01:13:50,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=144536.0, ans=0.125 2024-09-23 01:13:58,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=144536.0, ans=0.125 2024-09-23 01:14:05,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=144582.66666666666, ans=0.07 2024-09-23 01:14:11,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=144582.66666666666, ans=0.0 2024-09-23 01:14:19,735 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 01:14:30,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=144676.0, ans=0.2 2024-09-23 01:14:40,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=144676.0, ans=0.5 2024-09-23 01:14:59,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=144722.66666666666, ans=0.2 2024-09-23 01:15:03,785 INFO [train.py:1198] (3/4) Epoch 8, batch 3750, loss[loss=0.3119, ctc_loss=0.2271, cr_loss=0.424, over 15128.00 frames. ], tot_loss[loss=0.2654, ctc_loss=0.1871, cr_loss=0.3915, over 3355603.04 frames. ], batch size: 89, lr: 1.46e-02, grad_scale: 32.0 2024-09-23 01:15:07,863 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.371e+02 1.582e+02 1.824e+02 4.757e+02, threshold=3.165e+02, percent-clipped=1.0 2024-09-23 01:15:36,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=144862.66666666666, ans=0.2 2024-09-23 01:15:39,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=144862.66666666666, ans=0.0 2024-09-23 01:16:23,491 INFO [train.py:1198] (3/4) Epoch 8, batch 3800, loss[loss=0.3125, ctc_loss=0.2271, cr_loss=0.4267, over 17342.00 frames. ], tot_loss[loss=0.2657, ctc_loss=0.1874, cr_loss=0.3916, over 3348120.39 frames. ], batch size: 52, lr: 1.46e-02, grad_scale: 32.0 2024-09-23 01:16:32,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.18 vs. limit=15.0 2024-09-23 01:16:51,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=145049.33333333334, ans=0.125 2024-09-23 01:16:53,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=145096.0, ans=0.125 2024-09-23 01:17:04,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=22.5 2024-09-23 01:17:05,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=145096.0, ans=0.125 2024-09-23 01:17:41,070 INFO [train.py:1198] (3/4) Epoch 8, batch 3850, loss[loss=0.346, ctc_loss=0.2681, cr_loss=0.3894, over 11136.00 frames. ], tot_loss[loss=0.269, ctc_loss=0.1905, cr_loss=0.3924, over 3292301.21 frames. ], batch size: 123, lr: 1.46e-02, grad_scale: 16.0 2024-09-23 01:17:45,603 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.439e+02 1.582e+02 1.881e+02 4.076e+02, threshold=3.165e+02, percent-clipped=1.0 2024-09-23 01:17:50,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2024-09-23 01:18:02,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=145282.66666666666, ans=0.125 2024-09-23 01:18:21,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=15.0 2024-09-23 01:18:31,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=145376.0, ans=0.04949747468305833 2024-09-23 01:18:32,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.08 vs. limit=15.0 2024-09-23 01:18:45,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=145422.66666666666, ans=0.125 2024-09-23 01:19:42,988 INFO [train.py:1198] (3/4) Epoch 9, batch 0, loss[loss=0.284, ctc_loss=0.2021, cr_loss=0.4095, over 16655.00 frames. ], tot_loss[loss=0.284, ctc_loss=0.2021, cr_loss=0.4095, over 16655.00 frames. ], batch size: 61, lr: 1.38e-02, grad_scale: 32.0 2024-09-23 01:19:42,988 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 01:19:58,859 INFO [train.py:1230] (3/4) Epoch 9, validation: loss=0.05451, ctc_loss=0.05451, cr_loss=7.076e-15, over 944034.00 frames. 2024-09-23 01:19:58,859 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 01:20:04,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=145450.66666666666, ans=0.125 2024-09-23 01:20:28,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=145497.33333333334, ans=0.125 2024-09-23 01:20:40,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=145544.0, ans=0.125 2024-09-23 01:20:42,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=145544.0, ans=0.025 2024-09-23 01:21:06,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=145637.33333333334, ans=0.05 2024-09-23 01:21:23,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=145637.33333333334, ans=0.0 2024-09-23 01:21:26,166 INFO [train.py:1198] (3/4) Epoch 9, batch 50, loss[loss=0.2544, ctc_loss=0.1773, cr_loss=0.3854, over 17308.00 frames. ], tot_loss[loss=0.2623, ctc_loss=0.1853, cr_loss=0.3848, over 755223.32 frames. ], batch size: 51, lr: 1.38e-02, grad_scale: 32.0 2024-09-23 01:21:37,361 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.229e+02 1.512e+02 1.709e+02 2.026e+02 3.260e+02, threshold=3.417e+02, percent-clipped=2.0 2024-09-23 01:21:45,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=145730.66666666666, ans=0.025 2024-09-23 01:22:06,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=145777.33333333334, ans=0.125 2024-09-23 01:22:11,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.29 vs. limit=10.0 2024-09-23 01:22:12,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=145824.0, ans=0.125 2024-09-23 01:22:48,405 INFO [train.py:1198] (3/4) Epoch 9, batch 100, loss[loss=0.2729, ctc_loss=0.1929, cr_loss=0.4001, over 17148.00 frames. ], tot_loss[loss=0.2626, ctc_loss=0.1849, cr_loss=0.3886, over 1346186.69 frames. ], batch size: 48, lr: 1.38e-02, grad_scale: 32.0 2024-09-23 01:22:54,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=145917.33333333334, ans=0.125 2024-09-23 01:23:01,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=145917.33333333334, ans=0.1 2024-09-23 01:23:02,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=145964.0, ans=0.2 2024-09-23 01:23:04,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=145964.0, ans=0.0 2024-09-23 01:23:22,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=146010.66666666666, ans=0.125 2024-09-23 01:23:34,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=146057.33333333334, ans=0.2 2024-09-23 01:23:57,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=146104.0, ans=0.1 2024-09-23 01:24:08,183 INFO [train.py:1198] (3/4) Epoch 9, batch 150, loss[loss=0.2858, ctc_loss=0.2106, cr_loss=0.3764, over 16788.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1827, cr_loss=0.3869, over 1801262.94 frames. ], batch size: 61, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:24:19,530 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.178e+02 1.303e+02 1.427e+02 1.679e+02 2.380e+02, threshold=2.853e+02, percent-clipped=0.0 2024-09-23 01:24:29,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=146197.33333333334, ans=0.09899494936611666 2024-09-23 01:24:41,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=146244.0, ans=0.125 2024-09-23 01:24:53,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=146244.0, ans=0.125 2024-09-23 01:25:08,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=146290.66666666666, ans=0.0 2024-09-23 01:25:08,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=146290.66666666666, ans=0.125 2024-09-23 01:25:26,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=146337.33333333334, ans=0.025 2024-09-23 01:25:30,988 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=26.75 vs. limit=22.5 2024-09-23 01:25:33,202 INFO [train.py:1198] (3/4) Epoch 9, batch 200, loss[loss=0.2744, ctc_loss=0.191, cr_loss=0.4171, over 17011.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1839, cr_loss=0.3884, over 2139721.96 frames. ], batch size: 44, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:25:41,445 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 01:25:44,939 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2024-09-23 01:25:46,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=22.5 2024-09-23 01:26:00,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=146430.66666666666, ans=0.0 2024-09-23 01:26:03,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=146477.33333333334, ans=0.125 2024-09-23 01:26:16,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=146477.33333333334, ans=0.2 2024-09-23 01:26:38,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=146570.66666666666, ans=0.0 2024-09-23 01:26:52,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=146570.66666666666, ans=0.125 2024-09-23 01:26:55,832 INFO [train.py:1198] (3/4) Epoch 9, batch 250, loss[loss=0.2417, ctc_loss=0.1686, cr_loss=0.3654, over 17355.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1828, cr_loss=0.3876, over 2410319.06 frames. ], batch size: 48, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:26:56,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=146617.33333333334, ans=0.0 2024-09-23 01:27:01,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2024-09-23 01:27:05,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=146617.33333333334, ans=0.125 2024-09-23 01:27:06,849 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.339e+02 1.502e+02 1.713e+02 2.875e+02, threshold=3.003e+02, percent-clipped=1.0 2024-09-23 01:27:29,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=146710.66666666666, ans=0.125 2024-09-23 01:27:42,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=146710.66666666666, ans=0.125 2024-09-23 01:27:49,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2024-09-23 01:27:50,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=146757.33333333334, ans=0.0 2024-09-23 01:27:52,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=146757.33333333334, ans=0.1 2024-09-23 01:28:06,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=146804.0, ans=0.125 2024-09-23 01:28:17,489 INFO [train.py:1198] (3/4) Epoch 9, batch 300, loss[loss=0.3463, ctc_loss=0.2585, cr_loss=0.4391, over 11656.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1838, cr_loss=0.3886, over 2618278.53 frames. ], batch size: 123, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:29:12,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=146990.66666666666, ans=0.2 2024-09-23 01:29:31,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=147037.33333333334, ans=0.125 2024-09-23 01:29:37,418 INFO [train.py:1198] (3/4) Epoch 9, batch 350, loss[loss=0.2549, ctc_loss=0.1784, cr_loss=0.3826, over 17346.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1841, cr_loss=0.3879, over 2773890.39 frames. ], batch size: 48, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:29:48,845 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.338e+02 1.473e+02 1.743e+02 2.948e+02, threshold=2.946e+02, percent-clipped=0.0 2024-09-23 01:30:02,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=147130.66666666666, ans=0.025 2024-09-23 01:30:24,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=147177.33333333334, ans=0.1 2024-09-23 01:30:35,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=147224.0, ans=0.125 2024-09-23 01:30:37,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=147224.0, ans=0.5 2024-09-23 01:30:42,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=147224.0, ans=0.025 2024-09-23 01:31:02,659 INFO [train.py:1198] (3/4) Epoch 9, batch 400, loss[loss=0.227, ctc_loss=0.1583, cr_loss=0.3434, over 17249.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.183, cr_loss=0.3868, over 2908295.19 frames. ], batch size: 42, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:31:35,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=147410.66666666666, ans=0.1 2024-09-23 01:31:39,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=147410.66666666666, ans=0.04949747468305833 2024-09-23 01:31:43,733 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 01:32:03,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=147457.33333333334, ans=0.025 2024-09-23 01:32:14,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2024-09-23 01:32:20,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=147504.0, ans=0.025 2024-09-23 01:32:25,079 INFO [train.py:1198] (3/4) Epoch 9, batch 450, loss[loss=0.2373, ctc_loss=0.1632, cr_loss=0.3708, over 17215.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1822, cr_loss=0.3857, over 3010368.44 frames. ], batch size: 50, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:32:38,952 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.318e+02 1.484e+02 1.790e+02 2.979e+02, threshold=2.969e+02, percent-clipped=1.0 2024-09-23 01:32:39,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=147550.66666666666, ans=0.125 2024-09-23 01:32:47,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=147597.33333333334, ans=12.0 2024-09-23 01:33:14,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=147690.66666666666, ans=0.1 2024-09-23 01:33:41,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=147737.33333333334, ans=0.0 2024-09-23 01:33:47,564 INFO [train.py:1198] (3/4) Epoch 9, batch 500, loss[loss=0.2709, ctc_loss=0.1884, cr_loss=0.4123, over 16916.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1823, cr_loss=0.3869, over 3090774.99 frames. ], batch size: 58, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:33:51,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=147784.0, ans=0.125 2024-09-23 01:33:59,121 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 01:34:02,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=147830.66666666666, ans=0.05 2024-09-23 01:34:13,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2024-09-23 01:34:17,147 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.71 vs. limit=10.0 2024-09-23 01:34:34,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=147924.0, ans=0.125 2024-09-23 01:34:39,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2024-09-23 01:34:40,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=147924.0, ans=0.125 2024-09-23 01:34:43,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=147924.0, ans=0.0 2024-09-23 01:34:45,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=147924.0, ans=0.1 2024-09-23 01:34:57,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=147970.66666666666, ans=0.0 2024-09-23 01:35:09,831 INFO [train.py:1198] (3/4) Epoch 9, batch 550, loss[loss=0.2606, ctc_loss=0.1816, cr_loss=0.3947, over 17202.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1829, cr_loss=0.3883, over 3148783.41 frames. ], batch size: 47, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:35:10,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=148017.33333333334, ans=0.125 2024-09-23 01:35:23,370 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.335e+02 1.443e+02 1.626e+02 2.688e+02, threshold=2.885e+02, percent-clipped=0.0 2024-09-23 01:35:51,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.27 vs. limit=15.0 2024-09-23 01:36:04,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=148157.33333333334, ans=0.025 2024-09-23 01:36:34,063 INFO [train.py:1198] (3/4) Epoch 9, batch 600, loss[loss=0.2701, ctc_loss=0.1881, cr_loss=0.4096, over 17096.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1815, cr_loss=0.3878, over 3205807.90 frames. ], batch size: 49, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:36:39,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-23 01:36:43,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=148250.66666666666, ans=0.125 2024-09-23 01:36:53,548 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 01:37:01,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=148297.33333333334, ans=0.125 2024-09-23 01:37:30,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=148390.66666666666, ans=0.125 2024-09-23 01:37:56,746 INFO [train.py:1198] (3/4) Epoch 9, batch 650, loss[loss=0.2598, ctc_loss=0.1822, cr_loss=0.388, over 17102.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1819, cr_loss=0.3879, over 3243309.47 frames. ], batch size: 49, lr: 1.36e-02, grad_scale: 16.0 2024-09-23 01:38:03,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=148484.0, ans=0.0 2024-09-23 01:38:07,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2024-09-23 01:38:09,416 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.361e+02 1.468e+02 1.616e+02 2.267e+02, threshold=2.935e+02, percent-clipped=0.0 2024-09-23 01:38:36,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=148577.33333333334, ans=0.125 2024-09-23 01:38:46,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=148624.0, ans=0.125 2024-09-23 01:38:52,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=148624.0, ans=0.125 2024-09-23 01:39:08,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=148670.66666666666, ans=0.125 2024-09-23 01:39:15,866 INFO [train.py:1198] (3/4) Epoch 9, batch 700, loss[loss=0.2882, ctc_loss=0.2052, cr_loss=0.415, over 16016.00 frames. ], tot_loss[loss=0.2585, ctc_loss=0.1811, cr_loss=0.3866, over 3271916.83 frames. ], batch size: 74, lr: 1.36e-02, grad_scale: 16.0 2024-09-23 01:39:55,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.16 vs. limit=10.0 2024-09-23 01:40:04,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=148857.33333333334, ans=0.125 2024-09-23 01:40:18,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=148857.33333333334, ans=0.0 2024-09-23 01:40:40,042 INFO [train.py:1198] (3/4) Epoch 9, batch 750, loss[loss=0.293, ctc_loss=0.2018, cr_loss=0.4559, over 17039.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1817, cr_loss=0.3874, over 3287259.72 frames. ], batch size: 52, lr: 1.36e-02, grad_scale: 16.0 2024-09-23 01:40:52,737 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.365e+02 1.514e+02 1.779e+02 2.634e+02, threshold=3.027e+02, percent-clipped=0.0 2024-09-23 01:41:24,942 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 01:42:02,476 INFO [train.py:1198] (3/4) Epoch 9, batch 800, loss[loss=0.2644, ctc_loss=0.183, cr_loss=0.4069, over 17013.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1824, cr_loss=0.3889, over 3301384.10 frames. ], batch size: 53, lr: 1.36e-02, grad_scale: 32.0 2024-09-23 01:42:18,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=149230.66666666666, ans=0.125 2024-09-23 01:42:20,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=149230.66666666666, ans=0.125 2024-09-23 01:42:22,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=149230.66666666666, ans=0.2 2024-09-23 01:42:29,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=149230.66666666666, ans=10.0 2024-09-23 01:42:36,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=149277.33333333334, ans=0.0 2024-09-23 01:43:09,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=15.0 2024-09-23 01:43:10,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2024-09-23 01:43:27,284 INFO [train.py:1198] (3/4) Epoch 9, batch 850, loss[loss=0.2797, ctc_loss=0.1975, cr_loss=0.4109, over 17289.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1828, cr_loss=0.3887, over 3319911.61 frames. ], batch size: 51, lr: 1.36e-02, grad_scale: 32.0 2024-09-23 01:43:30,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=149417.33333333334, ans=0.125 2024-09-23 01:43:32,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=149417.33333333334, ans=0.0 2024-09-23 01:43:37,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=149417.33333333334, ans=0.04949747468305833 2024-09-23 01:43:39,947 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.394e+02 1.570e+02 1.849e+02 3.011e+02, threshold=3.140e+02, percent-clipped=0.0 2024-09-23 01:44:21,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2024-09-23 01:44:23,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=149557.33333333334, ans=0.2 2024-09-23 01:44:49,132 INFO [train.py:1198] (3/4) Epoch 9, batch 900, loss[loss=0.24, ctc_loss=0.1647, cr_loss=0.3768, over 17083.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1823, cr_loss=0.3887, over 3331936.13 frames. ], batch size: 49, lr: 1.36e-02, grad_scale: 32.0 2024-09-23 01:44:56,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.18 vs. limit=6.0 2024-09-23 01:44:57,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=149650.66666666666, ans=0.025 2024-09-23 01:45:05,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=149650.66666666666, ans=0.2 2024-09-23 01:45:32,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=149744.0, ans=0.125 2024-09-23 01:45:34,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=149744.0, ans=0.125 2024-09-23 01:45:42,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=22.5 2024-09-23 01:46:14,335 INFO [train.py:1198] (3/4) Epoch 9, batch 950, loss[loss=0.2319, ctc_loss=0.1584, cr_loss=0.3678, over 16941.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1817, cr_loss=0.3877, over 3337807.86 frames. ], batch size: 42, lr: 1.36e-02, grad_scale: 32.0 2024-09-23 01:46:27,022 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.338e+02 1.417e+02 1.570e+02 2.385e+02, threshold=2.834e+02, percent-clipped=0.0 2024-09-23 01:46:41,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=149930.66666666666, ans=0.0 2024-09-23 01:46:50,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2024-09-23 01:47:12,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=150024.0, ans=0.2 2024-09-23 01:47:32,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=150070.66666666666, ans=0.125 2024-09-23 01:47:37,174 INFO [train.py:1198] (3/4) Epoch 9, batch 1000, loss[loss=0.2871, ctc_loss=0.2034, cr_loss=0.4185, over 17012.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1832, cr_loss=0.3894, over 3338694.99 frames. ], batch size: 53, lr: 1.36e-02, grad_scale: 32.0 2024-09-23 01:47:49,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2024-09-23 01:47:52,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2024-09-23 01:48:05,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=150164.0, ans=0.125 2024-09-23 01:48:15,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=150210.66666666666, ans=0.125 2024-09-23 01:48:16,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-23 01:48:17,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=150210.66666666666, ans=0.125 2024-09-23 01:48:23,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=150257.33333333334, ans=0.125 2024-09-23 01:48:30,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=150257.33333333334, ans=0.05 2024-09-23 01:48:46,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2024-09-23 01:48:49,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=150304.0, ans=0.125 2024-09-23 01:48:57,105 INFO [train.py:1198] (3/4) Epoch 9, batch 1050, loss[loss=0.2163, ctc_loss=0.1515, cr_loss=0.3239, over 16664.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.183, cr_loss=0.3891, over 3340776.03 frames. ], batch size: 37, lr: 1.36e-02, grad_scale: 32.0 2024-09-23 01:49:01,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2024-09-23 01:49:08,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=150350.66666666666, ans=0.125 2024-09-23 01:49:09,787 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.329e+02 1.425e+02 1.705e+02 2.859e+02, threshold=2.851e+02, percent-clipped=1.0 2024-09-23 01:49:58,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150490.66666666666, ans=0.1 2024-09-23 01:50:05,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=150537.33333333334, ans=0.125 2024-09-23 01:50:06,389 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=22.5 2024-09-23 01:50:07,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=150537.33333333334, ans=0.025 2024-09-23 01:50:22,530 INFO [train.py:1198] (3/4) Epoch 9, batch 1100, loss[loss=0.2356, ctc_loss=0.1626, cr_loss=0.3654, over 17250.00 frames. ], tot_loss[loss=0.2614, ctc_loss=0.1836, cr_loss=0.3889, over 3332214.96 frames. ], batch size: 42, lr: 1.36e-02, grad_scale: 32.0 2024-09-23 01:50:29,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=150584.0, ans=0.125 2024-09-23 01:50:38,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150630.66666666666, ans=0.1 2024-09-23 01:50:48,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=150630.66666666666, ans=0.0 2024-09-23 01:51:00,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.13 vs. limit=10.0 2024-09-23 01:51:00,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=22.5 2024-09-23 01:51:21,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=150724.0, ans=0.025 2024-09-23 01:51:45,051 INFO [train.py:1198] (3/4) Epoch 9, batch 1150, loss[loss=0.268, ctc_loss=0.1833, cr_loss=0.4238, over 17259.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1836, cr_loss=0.3893, over 3340327.02 frames. ], batch size: 44, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 01:51:53,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=150817.33333333334, ans=0.125 2024-09-23 01:51:57,743 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.320e+02 1.454e+02 1.675e+02 2.569e+02, threshold=2.907e+02, percent-clipped=0.0 2024-09-23 01:52:03,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=150864.0, ans=0.125 2024-09-23 01:52:18,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=150910.66666666666, ans=0.05 2024-09-23 01:52:54,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=151004.0, ans=0.0 2024-09-23 01:53:07,107 INFO [train.py:1198] (3/4) Epoch 9, batch 1200, loss[loss=0.2732, ctc_loss=0.1914, cr_loss=0.4089, over 17093.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1838, cr_loss=0.3891, over 3339097.82 frames. ], batch size: 49, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 01:53:31,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=151097.33333333334, ans=0.1 2024-09-23 01:53:44,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.75 vs. limit=10.0 2024-09-23 01:53:47,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=151144.0, ans=0.125 2024-09-23 01:54:00,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=151190.66666666666, ans=0.2 2024-09-23 01:54:10,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=151237.33333333334, ans=0.0 2024-09-23 01:54:27,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.86 vs. limit=10.0 2024-09-23 01:54:27,530 INFO [train.py:1198] (3/4) Epoch 9, batch 1250, loss[loss=0.2657, ctc_loss=0.1895, cr_loss=0.3809, over 16538.00 frames. ], tot_loss[loss=0.262, ctc_loss=0.1839, cr_loss=0.3902, over 3350395.25 frames. ], batch size: 66, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 01:54:29,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=151284.0, ans=0.07 2024-09-23 01:54:42,634 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.349e+02 1.478e+02 1.607e+02 2.970e+02, threshold=2.955e+02, percent-clipped=1.0 2024-09-23 01:54:45,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=15.0 2024-09-23 01:54:49,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=151330.66666666666, ans=0.1 2024-09-23 01:54:54,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=12.0 2024-09-23 01:54:58,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=151330.66666666666, ans=0.0 2024-09-23 01:55:09,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=151377.33333333334, ans=0.025 2024-09-23 01:55:41,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.15 vs. limit=10.0 2024-09-23 01:55:41,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2024-09-23 01:55:51,563 INFO [train.py:1198] (3/4) Epoch 9, batch 1300, loss[loss=0.2669, ctc_loss=0.1871, cr_loss=0.3989, over 17031.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.1831, cr_loss=0.3889, over 3353075.37 frames. ], batch size: 51, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 01:56:21,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=151564.0, ans=0.125 2024-09-23 01:56:44,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=151657.33333333334, ans=12.0 2024-09-23 01:56:50,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=151657.33333333334, ans=0.025 2024-09-23 01:57:13,860 INFO [train.py:1198] (3/4) Epoch 9, batch 1350, loss[loss=0.2495, ctc_loss=0.175, cr_loss=0.3724, over 17185.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1838, cr_loss=0.3896, over 3353220.19 frames. ], batch size: 41, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 01:57:14,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=151750.66666666666, ans=0.125 2024-09-23 01:57:29,087 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.345e+02 1.485e+02 1.651e+02 2.569e+02, threshold=2.970e+02, percent-clipped=0.0 2024-09-23 01:57:53,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=151844.0, ans=0.2 2024-09-23 01:58:01,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2024-09-23 01:58:03,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=151890.66666666666, ans=0.0 2024-09-23 01:58:35,902 INFO [train.py:1198] (3/4) Epoch 9, batch 1400, loss[loss=0.311, ctc_loss=0.2353, cr_loss=0.3787, over 12081.00 frames. ], tot_loss[loss=0.2625, ctc_loss=0.1845, cr_loss=0.3901, over 3347252.01 frames. ], batch size: 124, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 01:58:47,420 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.92 vs. limit=10.0 2024-09-23 01:58:52,433 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2024-09-23 01:59:14,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=152077.33333333334, ans=0.02 2024-09-23 01:59:36,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=152124.0, ans=0.125 2024-09-23 01:59:46,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.66 vs. limit=12.0 2024-09-23 01:59:51,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.14 vs. limit=15.0 2024-09-23 01:59:54,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=152170.66666666666, ans=0.125 2024-09-23 02:00:01,049 INFO [train.py:1198] (3/4) Epoch 9, batch 1450, loss[loss=0.2148, ctc_loss=0.1493, cr_loss=0.3277, over 16981.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1829, cr_loss=0.3883, over 3349769.49 frames. ], batch size: 42, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 02:00:07,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=152217.33333333334, ans=0.125 2024-09-23 02:00:09,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=152217.33333333334, ans=0.2 2024-09-23 02:00:13,884 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.414e+02 1.556e+02 1.752e+02 2.841e+02, threshold=3.113e+02, percent-clipped=0.0 2024-09-23 02:00:15,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=152264.0, ans=0.125 2024-09-23 02:00:17,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=152264.0, ans=0.2 2024-09-23 02:00:46,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.89 vs. limit=12.0 2024-09-23 02:00:55,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.70 vs. limit=22.5 2024-09-23 02:00:56,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=152357.33333333334, ans=0.1 2024-09-23 02:01:04,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=152357.33333333334, ans=0.0 2024-09-23 02:01:17,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=152404.0, ans=0.0 2024-09-23 02:01:20,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-23 02:01:23,506 INFO [train.py:1198] (3/4) Epoch 9, batch 1500, loss[loss=0.2929, ctc_loss=0.2088, cr_loss=0.4205, over 17019.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.181, cr_loss=0.3863, over 3356844.54 frames. ], batch size: 53, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 02:01:31,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=152450.66666666666, ans=0.125 2024-09-23 02:01:31,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=152450.66666666666, ans=0.2 2024-09-23 02:01:37,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=152497.33333333334, ans=0.5 2024-09-23 02:01:53,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=152544.0, ans=0.04949747468305833 2024-09-23 02:02:24,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=152590.66666666666, ans=0.125 2024-09-23 02:02:29,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=152637.33333333334, ans=0.125 2024-09-23 02:02:45,184 INFO [train.py:1198] (3/4) Epoch 9, batch 1550, loss[loss=0.2638, ctc_loss=0.1844, cr_loss=0.397, over 17113.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1819, cr_loss=0.3887, over 3361451.01 frames. ], batch size: 40, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 02:02:57,972 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.340e+02 1.461e+02 1.652e+02 2.342e+02, threshold=2.922e+02, percent-clipped=0.0 2024-09-23 02:03:02,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=152730.66666666666, ans=0.0 2024-09-23 02:03:06,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=152730.66666666666, ans=0.125 2024-09-23 02:03:08,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=12.0 2024-09-23 02:03:09,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=152730.66666666666, ans=0.125 2024-09-23 02:04:04,958 INFO [train.py:1198] (3/4) Epoch 9, batch 1600, loss[loss=0.2553, ctc_loss=0.1754, cr_loss=0.3994, over 17039.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1817, cr_loss=0.3879, over 3357686.12 frames. ], batch size: 51, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 02:04:15,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=152917.33333333334, ans=0.09899494936611666 2024-09-23 02:04:30,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2024-09-23 02:05:23,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.04 vs. limit=22.5 2024-09-23 02:05:30,268 INFO [train.py:1198] (3/4) Epoch 9, batch 1650, loss[loss=0.2546, ctc_loss=0.1782, cr_loss=0.3823, over 17004.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1816, cr_loss=0.3884, over 3365216.65 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:05:42,928 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.315e+02 1.418e+02 1.618e+02 2.447e+02, threshold=2.836e+02, percent-clipped=0.0 2024-09-23 02:05:57,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=153197.33333333334, ans=0.0 2024-09-23 02:06:06,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=153244.0, ans=0.125 2024-09-23 02:06:12,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=153244.0, ans=0.0 2024-09-23 02:06:27,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=153290.66666666666, ans=0.125 2024-09-23 02:06:38,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=153337.33333333334, ans=0.1 2024-09-23 02:06:51,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=153384.0, ans=0.125 2024-09-23 02:06:52,590 INFO [train.py:1198] (3/4) Epoch 9, batch 1700, loss[loss=0.2902, ctc_loss=0.2098, cr_loss=0.4021, over 16080.00 frames. ], tot_loss[loss=0.2598, ctc_loss=0.1821, cr_loss=0.3889, over 3360681.61 frames. ], batch size: 74, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:07:13,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=153430.66666666666, ans=0.1 2024-09-23 02:07:13,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=153430.66666666666, ans=0.125 2024-09-23 02:07:19,557 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:07:21,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=153430.66666666666, ans=0.5 2024-09-23 02:07:30,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=153477.33333333334, ans=0.125 2024-09-23 02:08:10,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.39 vs. limit=10.0 2024-09-23 02:08:14,979 INFO [train.py:1198] (3/4) Epoch 9, batch 1750, loss[loss=0.2542, ctc_loss=0.1764, cr_loss=0.3888, over 17242.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1825, cr_loss=0.3897, over 3357195.85 frames. ], batch size: 44, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:08:15,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=153617.33333333334, ans=0.125 2024-09-23 02:08:15,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=153617.33333333334, ans=0.025 2024-09-23 02:08:27,631 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.306e+02 1.435e+02 1.601e+02 2.532e+02, threshold=2.871e+02, percent-clipped=0.0 2024-09-23 02:08:47,608 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=12.0 2024-09-23 02:09:31,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=12.0 2024-09-23 02:09:35,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=153850.66666666666, ans=0.125 2024-09-23 02:09:37,155 INFO [train.py:1198] (3/4) Epoch 9, batch 1800, loss[loss=0.2704, ctc_loss=0.1903, cr_loss=0.4005, over 17345.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1817, cr_loss=0.388, over 3358663.30 frames. ], batch size: 48, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:09:43,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=153850.66666666666, ans=0.125 2024-09-23 02:10:15,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=153944.0, ans=0.0 2024-09-23 02:10:31,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=153990.66666666666, ans=0.1 2024-09-23 02:10:57,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=154037.33333333334, ans=0.1 2024-09-23 02:11:02,434 INFO [train.py:1198] (3/4) Epoch 9, batch 1850, loss[loss=0.2282, ctc_loss=0.1554, cr_loss=0.3644, over 17118.00 frames. ], tot_loss[loss=0.2585, ctc_loss=0.181, cr_loss=0.3875, over 3363644.07 frames. ], batch size: 40, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:11:11,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-09-23 02:11:15,289 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.311e+02 1.455e+02 1.599e+02 2.363e+02, threshold=2.909e+02, percent-clipped=0.0 2024-09-23 02:11:15,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=154084.0, ans=0.125 2024-09-23 02:11:20,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=154130.66666666666, ans=0.125 2024-09-23 02:11:32,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.72 vs. limit=10.0 2024-09-23 02:11:37,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=154177.33333333334, ans=0.125 2024-09-23 02:12:16,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=154270.66666666666, ans=0.1 2024-09-23 02:12:24,534 INFO [train.py:1198] (3/4) Epoch 9, batch 1900, loss[loss=0.2573, ctc_loss=0.1824, cr_loss=0.3745, over 17269.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1812, cr_loss=0.3873, over 3348504.58 frames. ], batch size: 44, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:12:25,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.36 vs. limit=10.0 2024-09-23 02:12:26,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=154317.33333333334, ans=0.1 2024-09-23 02:12:35,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=154317.33333333334, ans=0.05 2024-09-23 02:12:47,278 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.38 vs. limit=22.5 2024-09-23 02:12:54,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=154410.66666666666, ans=0.125 2024-09-23 02:13:29,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=154504.0, ans=0.5 2024-09-23 02:13:43,683 INFO [train.py:1198] (3/4) Epoch 9, batch 1950, loss[loss=0.2325, ctc_loss=0.162, cr_loss=0.3528, over 17257.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1807, cr_loss=0.387, over 3352495.32 frames. ], batch size: 44, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:13:51,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=154550.66666666666, ans=0.0 2024-09-23 02:13:56,416 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.387e+02 1.540e+02 1.769e+02 3.177e+02, threshold=3.081e+02, percent-clipped=1.0 2024-09-23 02:14:17,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=154644.0, ans=0.2 2024-09-23 02:14:23,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=154644.0, ans=0.1 2024-09-23 02:14:29,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=154644.0, ans=0.125 2024-09-23 02:14:36,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-09-23 02:14:47,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=154690.66666666666, ans=0.0 2024-09-23 02:14:52,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=154737.33333333334, ans=0.125 2024-09-23 02:15:08,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2024-09-23 02:15:09,134 INFO [train.py:1198] (3/4) Epoch 9, batch 2000, loss[loss=0.3358, ctc_loss=0.2478, cr_loss=0.4404, over 14756.00 frames. ], tot_loss[loss=0.2583, ctc_loss=0.1809, cr_loss=0.3874, over 3358430.71 frames. ], batch size: 89, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:15:09,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=154784.0, ans=0.2 2024-09-23 02:15:09,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=154784.0, ans=0.125 2024-09-23 02:15:34,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=154830.66666666666, ans=0.125 2024-09-23 02:15:34,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=154830.66666666666, ans=0.0 2024-09-23 02:15:42,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=154877.33333333334, ans=0.5 2024-09-23 02:15:47,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=154877.33333333334, ans=0.125 2024-09-23 02:15:50,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=154877.33333333334, ans=0.0 2024-09-23 02:15:59,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=154924.0, ans=0.07 2024-09-23 02:16:26,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=154970.66666666666, ans=0.125 2024-09-23 02:16:31,231 INFO [train.py:1198] (3/4) Epoch 9, batch 2050, loss[loss=0.2644, ctc_loss=0.1823, cr_loss=0.4104, over 17358.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1814, cr_loss=0.3875, over 3352295.40 frames. ], batch size: 48, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:16:31,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=155017.33333333334, ans=0.2 2024-09-23 02:16:42,994 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2024-09-23 02:16:43,967 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.371e+02 1.510e+02 1.685e+02 3.292e+02, threshold=3.020e+02, percent-clipped=1.0 2024-09-23 02:17:01,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-09-23 02:17:14,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=155110.66666666666, ans=0.125 2024-09-23 02:17:18,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=155110.66666666666, ans=0.0 2024-09-23 02:17:49,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2024-09-23 02:17:53,735 INFO [train.py:1198] (3/4) Epoch 9, batch 2100, loss[loss=0.2149, ctc_loss=0.1497, cr_loss=0.3259, over 17060.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1808, cr_loss=0.3868, over 3356088.50 frames. ], batch size: 39, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:18:00,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=155250.66666666666, ans=0.0 2024-09-23 02:18:13,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2024-09-23 02:18:19,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=155297.33333333334, ans=0.125 2024-09-23 02:19:00,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=155437.33333333334, ans=0.125 2024-09-23 02:19:10,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=155437.33333333334, ans=0.0 2024-09-23 02:19:11,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=155484.0, ans=0.125 2024-09-23 02:19:12,974 INFO [train.py:1198] (3/4) Epoch 9, batch 2150, loss[loss=0.2604, ctc_loss=0.181, cr_loss=0.3973, over 17242.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1811, cr_loss=0.3873, over 3363690.72 frames. ], batch size: 55, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:19:28,282 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.379e+02 1.514e+02 1.800e+02 2.768e+02, threshold=3.028e+02, percent-clipped=0.0 2024-09-23 02:19:33,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=155530.66666666666, ans=0.125 2024-09-23 02:20:11,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=155624.0, ans=0.1 2024-09-23 02:20:38,261 INFO [train.py:1198] (3/4) Epoch 9, batch 2200, loss[loss=0.2444, ctc_loss=0.1716, cr_loss=0.3641, over 17271.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1799, cr_loss=0.3855, over 3367230.34 frames. ], batch size: 44, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:20:57,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=155764.0, ans=0.5 2024-09-23 02:21:19,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=155810.66666666666, ans=0.0 2024-09-23 02:22:03,747 INFO [train.py:1198] (3/4) Epoch 9, batch 2250, loss[loss=0.264, ctc_loss=0.183, cr_loss=0.4049, over 17213.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1803, cr_loss=0.3852, over 3358518.40 frames. ], batch size: 47, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:22:09,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.11 vs. limit=12.0 2024-09-23 02:22:16,469 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.340e+02 1.524e+02 1.728e+02 3.121e+02, threshold=3.047e+02, percent-clipped=1.0 2024-09-23 02:22:30,062 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2024-09-23 02:22:34,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=156044.0, ans=0.0 2024-09-23 02:22:39,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=156044.0, ans=0.0 2024-09-23 02:23:04,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=156090.66666666666, ans=0.0 2024-09-23 02:23:23,328 INFO [train.py:1198] (3/4) Epoch 9, batch 2300, loss[loss=0.2584, ctc_loss=0.1844, cr_loss=0.3701, over 17009.00 frames. ], tot_loss[loss=0.2565, ctc_loss=0.1796, cr_loss=0.3841, over 3358408.08 frames. ], batch size: 51, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:23:53,484 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.14 vs. limit=22.5 2024-09-23 02:23:59,550 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2024-09-23 02:24:08,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=156277.33333333334, ans=0.1 2024-09-23 02:24:16,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=156324.0, ans=0.2 2024-09-23 02:24:27,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2024-09-23 02:24:33,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=156370.66666666666, ans=0.1 2024-09-23 02:24:48,597 INFO [train.py:1198] (3/4) Epoch 9, batch 2350, loss[loss=0.2857, ctc_loss=0.2028, cr_loss=0.4145, over 16733.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1791, cr_loss=0.3839, over 3359149.78 frames. ], batch size: 61, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:25:01,204 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.187e+02 1.341e+02 1.440e+02 1.582e+02 2.935e+02, threshold=2.879e+02, percent-clipped=0.0 2024-09-23 02:25:17,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=156464.0, ans=0.125 2024-09-23 02:25:20,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=156510.66666666666, ans=0.0 2024-09-23 02:25:56,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=156604.0, ans=0.2 2024-09-23 02:26:10,109 INFO [train.py:1198] (3/4) Epoch 9, batch 2400, loss[loss=0.2822, ctc_loss=0.2004, cr_loss=0.4088, over 16519.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1795, cr_loss=0.3842, over 3360034.23 frames. ], batch size: 66, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:26:13,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=156650.66666666666, ans=0.125 2024-09-23 02:26:39,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=156697.33333333334, ans=0.1 2024-09-23 02:27:04,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=156790.66666666666, ans=0.125 2024-09-23 02:27:10,601 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:27:32,765 INFO [train.py:1198] (3/4) Epoch 9, batch 2450, loss[loss=0.2402, ctc_loss=0.1657, cr_loss=0.3729, over 17021.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1794, cr_loss=0.3834, over 3356343.88 frames. ], batch size: 44, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:27:45,477 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.502e+02 1.626e+02 1.858e+02 2.761e+02, threshold=3.252e+02, percent-clipped=0.0 2024-09-23 02:28:08,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=156977.33333333334, ans=0.025 2024-09-23 02:28:13,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=156977.33333333334, ans=0.125 2024-09-23 02:28:16,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=156977.33333333334, ans=0.0 2024-09-23 02:28:17,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=156977.33333333334, ans=0.0 2024-09-23 02:28:34,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2024-09-23 02:28:52,844 INFO [train.py:1198] (3/4) Epoch 9, batch 2500, loss[loss=0.2471, ctc_loss=0.1719, cr_loss=0.3757, over 17119.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.18, cr_loss=0.3846, over 3357221.48 frames. ], batch size: 49, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:29:05,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.36 vs. limit=22.5 2024-09-23 02:29:05,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=157117.33333333334, ans=0.2 2024-09-23 02:29:07,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=157164.0, ans=0.0 2024-09-23 02:29:23,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2024-09-23 02:29:55,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=157257.33333333334, ans=0.0 2024-09-23 02:29:55,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=157257.33333333334, ans=0.2 2024-09-23 02:29:56,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=22.5 2024-09-23 02:29:57,423 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:30:18,045 INFO [train.py:1198] (3/4) Epoch 9, batch 2550, loss[loss=0.2742, ctc_loss=0.1906, cr_loss=0.4181, over 17209.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1798, cr_loss=0.3851, over 3358604.64 frames. ], batch size: 55, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:30:24,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=157350.66666666666, ans=0.125 2024-09-23 02:30:30,730 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.377e+02 1.519e+02 1.754e+02 2.605e+02, threshold=3.038e+02, percent-clipped=0.0 2024-09-23 02:30:48,670 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.51 vs. limit=15.0 2024-09-23 02:31:05,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=157444.0, ans=0.1 2024-09-23 02:31:20,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.77 vs. limit=12.0 2024-09-23 02:31:39,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=157584.0, ans=0.025 2024-09-23 02:31:40,956 INFO [train.py:1198] (3/4) Epoch 9, batch 2600, loss[loss=0.2691, ctc_loss=0.1855, cr_loss=0.4183, over 16921.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1795, cr_loss=0.3844, over 3356090.89 frames. ], batch size: 58, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:31:41,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=157584.0, ans=0.125 2024-09-23 02:31:44,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=157584.0, ans=0.125 2024-09-23 02:31:48,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=157584.0, ans=0.1 2024-09-23 02:32:10,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2024-09-23 02:32:23,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=157677.33333333334, ans=0.125 2024-09-23 02:32:31,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=157724.0, ans=0.0 2024-09-23 02:32:31,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=157724.0, ans=0.125 2024-09-23 02:33:04,042 INFO [train.py:1198] (3/4) Epoch 9, batch 2650, loss[loss=0.2694, ctc_loss=0.1847, cr_loss=0.4234, over 17099.00 frames. ], tot_loss[loss=0.2565, ctc_loss=0.1796, cr_loss=0.3842, over 3346769.36 frames. ], batch size: 49, lr: 1.33e-02, grad_scale: 64.0 2024-09-23 02:33:12,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=157817.33333333334, ans=0.0 2024-09-23 02:33:16,814 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.311e+02 1.423e+02 1.645e+02 2.492e+02, threshold=2.847e+02, percent-clipped=0.0 2024-09-23 02:33:38,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.05 vs. limit=15.0 2024-09-23 02:33:42,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=157910.66666666666, ans=0.0 2024-09-23 02:33:44,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=157910.66666666666, ans=0.1 2024-09-23 02:34:15,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.33 vs. limit=5.0 2024-09-23 02:34:22,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=158004.0, ans=0.025 2024-09-23 02:34:26,732 INFO [train.py:1198] (3/4) Epoch 9, batch 2700, loss[loss=0.2338, ctc_loss=0.1599, cr_loss=0.3699, over 17031.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1789, cr_loss=0.3844, over 3356564.81 frames. ], batch size: 52, lr: 1.32e-02, grad_scale: 64.0 2024-09-23 02:34:50,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.78 vs. limit=10.0 2024-09-23 02:34:57,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=15.0 2024-09-23 02:35:07,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=158144.0, ans=0.125 2024-09-23 02:35:14,733 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-23 02:35:24,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2024-09-23 02:35:51,952 INFO [train.py:1198] (3/4) Epoch 9, batch 2750, loss[loss=0.2291, ctc_loss=0.1588, cr_loss=0.3518, over 17175.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1785, cr_loss=0.3846, over 3360459.42 frames. ], batch size: 41, lr: 1.32e-02, grad_scale: 32.0 2024-09-23 02:36:06,256 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.343e+02 1.485e+02 1.822e+02 3.173e+02, threshold=2.970e+02, percent-clipped=1.0 2024-09-23 02:36:38,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=158424.0, ans=0.0 2024-09-23 02:36:49,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=158424.0, ans=10.0 2024-09-23 02:37:00,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=158470.66666666666, ans=0.1 2024-09-23 02:37:08,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2024-09-23 02:37:14,490 INFO [train.py:1198] (3/4) Epoch 9, batch 2800, loss[loss=0.2457, ctc_loss=0.1715, cr_loss=0.3709, over 17115.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1793, cr_loss=0.3857, over 3365154.21 frames. ], batch size: 49, lr: 1.32e-02, grad_scale: 32.0 2024-09-23 02:37:14,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=158517.33333333334, ans=0.1 2024-09-23 02:37:21,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=15.0 2024-09-23 02:37:42,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=158564.0, ans=0.025 2024-09-23 02:37:47,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=6.0 2024-09-23 02:37:48,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=158610.66666666666, ans=0.0 2024-09-23 02:37:58,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=158610.66666666666, ans=0.125 2024-09-23 02:38:17,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=158704.0, ans=0.1 2024-09-23 02:38:26,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=158704.0, ans=0.125 2024-09-23 02:38:34,674 INFO [train.py:1198] (3/4) Epoch 9, batch 2850, loss[loss=0.3118, ctc_loss=0.2319, cr_loss=0.3998, over 11787.00 frames. ], tot_loss[loss=0.2567, ctc_loss=0.1796, cr_loss=0.3858, over 3357957.80 frames. ], batch size: 123, lr: 1.32e-02, grad_scale: 16.0 2024-09-23 02:38:36,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=158750.66666666666, ans=0.125 2024-09-23 02:38:50,464 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.314e+02 1.411e+02 1.546e+02 2.607e+02, threshold=2.821e+02, percent-clipped=0.0 2024-09-23 02:39:09,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=158844.0, ans=0.0 2024-09-23 02:39:23,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=158890.66666666666, ans=0.125 2024-09-23 02:39:48,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=158937.33333333334, ans=0.2 2024-09-23 02:39:59,461 INFO [train.py:1198] (3/4) Epoch 9, batch 2900, loss[loss=0.2908, ctc_loss=0.2053, cr_loss=0.4275, over 17033.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1804, cr_loss=0.387, over 3357684.40 frames. ], batch size: 56, lr: 1.32e-02, grad_scale: 16.0 2024-09-23 02:39:59,870 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:40:07,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=158984.0, ans=0.025 2024-09-23 02:40:07,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=158984.0, ans=0.1 2024-09-23 02:40:22,721 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2024-09-23 02:40:30,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=159030.66666666666, ans=0.015 2024-09-23 02:40:46,045 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.02 vs. limit=12.0 2024-09-23 02:40:55,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=159124.0, ans=0.125 2024-09-23 02:41:21,663 INFO [train.py:1198] (3/4) Epoch 9, batch 2950, loss[loss=0.2521, ctc_loss=0.1748, cr_loss=0.3866, over 17310.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1803, cr_loss=0.3871, over 3355694.58 frames. ], batch size: 46, lr: 1.32e-02, grad_scale: 16.0 2024-09-23 02:41:40,281 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.351e+02 1.516e+02 1.686e+02 2.482e+02, threshold=3.032e+02, percent-clipped=0.0 2024-09-23 02:42:03,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-23 02:42:09,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=159310.66666666666, ans=0.0 2024-09-23 02:42:21,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=159357.33333333334, ans=0.125 2024-09-23 02:42:35,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=159404.0, ans=0.0 2024-09-23 02:42:39,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=159404.0, ans=0.2 2024-09-23 02:42:39,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=15.0 2024-09-23 02:42:43,438 INFO [train.py:1198] (3/4) Epoch 9, batch 3000, loss[loss=0.2327, ctc_loss=0.1561, cr_loss=0.3829, over 17285.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1804, cr_loss=0.3869, over 3355467.86 frames. ], batch size: 46, lr: 1.32e-02, grad_scale: 16.0 2024-09-23 02:42:43,439 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 02:42:59,049 INFO [train.py:1230] (3/4) Epoch 9, validation: loss=0.05024, ctc_loss=0.05024, cr_loss=7.059e-15, over 944034.00 frames. 2024-09-23 02:42:59,051 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 02:43:59,838 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.62 vs. limit=10.0 2024-09-23 02:44:00,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=159637.33333333334, ans=0.125 2024-09-23 02:44:11,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.05 vs. limit=5.0 2024-09-23 02:44:17,777 INFO [train.py:1198] (3/4) Epoch 9, batch 3050, loss[loss=0.2688, ctc_loss=0.1873, cr_loss=0.4077, over 17061.00 frames. ], tot_loss[loss=0.2579, ctc_loss=0.1805, cr_loss=0.3868, over 3358100.09 frames. ], batch size: 46, lr: 1.32e-02, grad_scale: 16.0 2024-09-23 02:44:19,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=159684.0, ans=0.0 2024-09-23 02:44:21,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=159684.0, ans=0.125 2024-09-23 02:44:33,651 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.314e+02 1.416e+02 1.662e+02 2.316e+02, threshold=2.832e+02, percent-clipped=0.0 2024-09-23 02:45:13,173 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=22.5 2024-09-23 02:45:23,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=159870.66666666666, ans=0.0 2024-09-23 02:45:35,933 INFO [train.py:1198] (3/4) Epoch 9, batch 3100, loss[loss=0.258, ctc_loss=0.1815, cr_loss=0.3821, over 17026.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1805, cr_loss=0.3862, over 3355586.11 frames. ], batch size: 51, lr: 1.32e-02, grad_scale: 16.0 2024-09-23 02:45:42,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=159917.33333333334, ans=0.0 2024-09-23 02:45:46,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=159917.33333333334, ans=0.125 2024-09-23 02:45:50,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=159917.33333333334, ans=0.0 2024-09-23 02:45:55,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=159964.0, ans=0.07 2024-09-23 02:45:56,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=22.5 2024-09-23 02:46:08,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=160010.66666666666, ans=0.125 2024-09-23 02:46:21,101 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:46:51,383 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=22.5 2024-09-23 02:46:59,412 INFO [train.py:1198] (3/4) Epoch 9, batch 3150, loss[loss=0.3453, ctc_loss=0.2604, cr_loss=0.4246, over 11612.00 frames. ], tot_loss[loss=0.2579, ctc_loss=0.1806, cr_loss=0.3862, over 3351541.43 frames. ], batch size: 123, lr: 1.32e-02, grad_scale: 16.0 2024-09-23 02:47:15,045 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.206e+02 1.398e+02 1.504e+02 1.697e+02 2.388e+02, threshold=3.008e+02, percent-clipped=0.0 2024-09-23 02:47:19,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=160197.33333333334, ans=0.125 2024-09-23 02:47:40,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-09-23 02:47:43,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=160244.0, ans=0.025 2024-09-23 02:47:46,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=160290.66666666666, ans=0.0 2024-09-23 02:47:49,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=160290.66666666666, ans=0.125 2024-09-23 02:47:50,163 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2024-09-23 02:48:17,651 INFO [train.py:1198] (3/4) Epoch 9, batch 3200, loss[loss=0.2346, ctc_loss=0.1615, cr_loss=0.3654, over 16990.00 frames. ], tot_loss[loss=0.2576, ctc_loss=0.1804, cr_loss=0.3863, over 3355293.34 frames. ], batch size: 53, lr: 1.32e-02, grad_scale: 32.0 2024-09-23 02:48:17,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=160384.0, ans=0.125 2024-09-23 02:48:25,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=160384.0, ans=0.0 2024-09-23 02:48:33,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2024-09-23 02:48:48,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.96 vs. limit=15.0 2024-09-23 02:48:53,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=160477.33333333334, ans=0.0 2024-09-23 02:49:01,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=160477.33333333334, ans=0.125 2024-09-23 02:49:13,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=160524.0, ans=0.2 2024-09-23 02:49:26,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=160570.66666666666, ans=0.125 2024-09-23 02:49:38,715 INFO [train.py:1198] (3/4) Epoch 9, batch 3250, loss[loss=0.2682, ctc_loss=0.1882, cr_loss=0.4, over 17226.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.1799, cr_loss=0.3855, over 3359435.71 frames. ], batch size: 55, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 02:49:43,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=160617.33333333334, ans=0.2 2024-09-23 02:49:51,894 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=22.5 2024-09-23 02:49:54,282 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.347e+02 1.464e+02 1.659e+02 3.194e+02, threshold=2.929e+02, percent-clipped=1.0 2024-09-23 02:50:02,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=160664.0, ans=0.1 2024-09-23 02:50:12,533 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2024-09-23 02:50:19,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=160710.66666666666, ans=0.0 2024-09-23 02:50:24,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=160757.33333333334, ans=0.0 2024-09-23 02:50:33,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=160757.33333333334, ans=0.025 2024-09-23 02:50:42,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=160804.0, ans=0.09899494936611666 2024-09-23 02:50:47,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=160804.0, ans=0.07 2024-09-23 02:50:56,581 INFO [train.py:1198] (3/4) Epoch 9, batch 3300, loss[loss=0.2764, ctc_loss=0.1925, cr_loss=0.4195, over 17165.00 frames. ], tot_loss[loss=0.2565, ctc_loss=0.1795, cr_loss=0.3851, over 3355132.34 frames. ], batch size: 45, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 02:51:16,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.18 vs. limit=10.0 2024-09-23 02:51:42,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=160944.0, ans=0.0 2024-09-23 02:51:43,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=160990.66666666666, ans=0.0 2024-09-23 02:51:46,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=160990.66666666666, ans=0.125 2024-09-23 02:51:56,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=160990.66666666666, ans=0.125 2024-09-23 02:52:03,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=161037.33333333334, ans=0.125 2024-09-23 02:52:15,711 INFO [train.py:1198] (3/4) Epoch 9, batch 3350, loss[loss=0.2847, ctc_loss=0.1991, cr_loss=0.4279, over 17032.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1807, cr_loss=0.3866, over 3336720.81 frames. ], batch size: 52, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 02:52:31,268 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.345e+02 1.546e+02 1.825e+02 3.271e+02, threshold=3.093e+02, percent-clipped=1.0 2024-09-23 02:52:31,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=161130.66666666666, ans=0.125 2024-09-23 02:52:43,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161130.66666666666, ans=0.1 2024-09-23 02:53:02,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=161224.0, ans=0.0 2024-09-23 02:53:12,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=22.5 2024-09-23 02:53:20,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-09-23 02:53:27,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=161270.66666666666, ans=0.125 2024-09-23 02:53:32,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=161317.33333333334, ans=0.125 2024-09-23 02:53:33,971 INFO [train.py:1198] (3/4) Epoch 9, batch 3400, loss[loss=0.1992, ctc_loss=0.1355, cr_loss=0.3183, over 16717.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1804, cr_loss=0.3868, over 3345942.90 frames. ], batch size: 37, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 02:53:52,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=161364.0, ans=0.0 2024-09-23 02:53:57,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=161364.0, ans=0.025 2024-09-23 02:54:16,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=161410.66666666666, ans=0.125 2024-09-23 02:54:17,827 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=22.5 2024-09-23 02:54:19,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=161457.33333333334, ans=10.0 2024-09-23 02:54:33,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=161457.33333333334, ans=0.125 2024-09-23 02:54:45,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=161504.0, ans=0.125 2024-09-23 02:54:51,711 INFO [train.py:1198] (3/4) Epoch 9, batch 3450, loss[loss=0.2381, ctc_loss=0.1661, cr_loss=0.3597, over 17119.00 frames. ], tot_loss[loss=0.2579, ctc_loss=0.1804, cr_loss=0.3874, over 3346709.15 frames. ], batch size: 40, lr: 1.31e-02, grad_scale: 16.0 2024-09-23 02:55:08,998 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.381e+02 1.554e+02 1.871e+02 2.951e+02, threshold=3.107e+02, percent-clipped=0.0 2024-09-23 02:55:30,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=161644.0, ans=0.025 2024-09-23 02:55:38,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=161690.66666666666, ans=0.125 2024-09-23 02:56:10,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=161784.0, ans=0.125 2024-09-23 02:56:11,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=161784.0, ans=0.125 2024-09-23 02:56:12,452 INFO [train.py:1198] (3/4) Epoch 9, batch 3500, loss[loss=0.2504, ctc_loss=0.1735, cr_loss=0.3846, over 16915.00 frames. ], tot_loss[loss=0.2565, ctc_loss=0.1793, cr_loss=0.3857, over 3347326.94 frames. ], batch size: 58, lr: 1.31e-02, grad_scale: 16.0 2024-09-23 02:56:37,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=161830.66666666666, ans=0.125 2024-09-23 02:56:40,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=161830.66666666666, ans=0.125 2024-09-23 02:56:51,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=161877.33333333334, ans=0.0 2024-09-23 02:56:58,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=161877.33333333334, ans=0.07 2024-09-23 02:57:03,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=1.99 vs. limit=15.0 2024-09-23 02:57:08,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2024-09-23 02:57:15,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=161970.66666666666, ans=0.04949747468305833 2024-09-23 02:57:32,421 INFO [train.py:1198] (3/4) Epoch 9, batch 3550, loss[loss=0.2498, ctc_loss=0.1747, cr_loss=0.3755, over 17096.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1797, cr_loss=0.3861, over 3346442.67 frames. ], batch size: 49, lr: 1.31e-02, grad_scale: 16.0 2024-09-23 02:57:46,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=162064.0, ans=0.1 2024-09-23 02:57:49,779 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.397e+02 1.544e+02 1.862e+02 4.630e+02, threshold=3.088e+02, percent-clipped=2.0 2024-09-23 02:58:10,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.10 vs. limit=22.5 2024-09-23 02:58:12,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.27 vs. limit=22.5 2024-09-23 02:58:13,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=162110.66666666666, ans=0.2 2024-09-23 02:58:13,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=162110.66666666666, ans=0.125 2024-09-23 02:58:14,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=162110.66666666666, ans=0.125 2024-09-23 02:58:30,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=162157.33333333334, ans=10.0 2024-09-23 02:58:42,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2024-09-23 02:58:52,214 INFO [train.py:1198] (3/4) Epoch 9, batch 3600, loss[loss=0.2592, ctc_loss=0.1794, cr_loss=0.3988, over 17168.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1798, cr_loss=0.387, over 3354653.75 frames. ], batch size: 45, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 02:58:58,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=162250.66666666666, ans=0.125 2024-09-23 02:58:58,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=162250.66666666666, ans=0.125 2024-09-23 02:59:06,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.67 vs. limit=15.0 2024-09-23 02:59:33,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=162344.0, ans=0.09899494936611666 2024-09-23 02:59:39,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=162390.66666666666, ans=0.125 2024-09-23 02:59:45,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=162390.66666666666, ans=0.125 2024-09-23 03:00:10,092 INFO [train.py:1198] (3/4) Epoch 9, batch 3650, loss[loss=0.2621, ctc_loss=0.1815, cr_loss=0.4032, over 17088.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.18, cr_loss=0.387, over 3354844.74 frames. ], batch size: 46, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 03:00:27,444 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.388e+02 1.522e+02 1.765e+02 2.573e+02, threshold=3.044e+02, percent-clipped=0.0 2024-09-23 03:00:37,210 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:01:29,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=162717.33333333334, ans=0.0 2024-09-23 03:01:30,908 INFO [train.py:1198] (3/4) Epoch 9, batch 3700, loss[loss=0.2805, ctc_loss=0.2, cr_loss=0.4023, over 16868.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1799, cr_loss=0.3877, over 3355949.50 frames. ], batch size: 58, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 03:01:31,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=162717.33333333334, ans=0.125 2024-09-23 03:01:38,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=162717.33333333334, ans=0.125 2024-09-23 03:01:58,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=162764.0, ans=0.125 2024-09-23 03:02:09,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=162810.66666666666, ans=0.2 2024-09-23 03:02:30,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2024-09-23 03:02:48,724 INFO [train.py:1198] (3/4) Epoch 9, batch 3750, loss[loss=0.2642, ctc_loss=0.1844, cr_loss=0.3991, over 16039.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1807, cr_loss=0.3884, over 3335835.35 frames. ], batch size: 74, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 03:03:05,821 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.286e+02 1.440e+02 1.620e+02 2.372e+02, threshold=2.880e+02, percent-clipped=0.0 2024-09-23 03:03:43,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=163090.66666666666, ans=0.0 2024-09-23 03:04:02,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=163137.33333333334, ans=0.2 2024-09-23 03:04:07,106 INFO [train.py:1198] (3/4) Epoch 9, batch 3800, loss[loss=0.2342, ctc_loss=0.1651, cr_loss=0.3451, over 17008.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1814, cr_loss=0.3886, over 3313547.15 frames. ], batch size: 51, lr: 1.30e-02, grad_scale: 32.0 2024-09-23 03:04:24,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=163230.66666666666, ans=0.1 2024-09-23 03:04:44,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=163277.33333333334, ans=0.125 2024-09-23 03:04:51,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=163277.33333333334, ans=0.1 2024-09-23 03:04:56,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.87 vs. limit=12.0 2024-09-23 03:05:25,697 INFO [train.py:1198] (3/4) Epoch 9, batch 3850, loss[loss=0.2138, ctc_loss=0.1454, cr_loss=0.3416, over 16936.00 frames. ], tot_loss[loss=0.2613, ctc_loss=0.1834, cr_loss=0.3896, over 3274967.41 frames. ], batch size: 42, lr: 1.30e-02, grad_scale: 32.0 2024-09-23 03:05:36,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=163417.33333333334, ans=0.0 2024-09-23 03:05:42,524 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.391e+02 1.511e+02 1.701e+02 2.274e+02, threshold=3.022e+02, percent-clipped=0.0 2024-09-23 03:05:50,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=163464.0, ans=0.1 2024-09-23 03:06:00,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2024-09-23 03:06:12,626 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2024-09-23 03:06:19,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=163557.33333333334, ans=0.125 2024-09-23 03:06:25,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=163604.0, ans=0.125 2024-09-23 03:06:31,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=163604.0, ans=0.2 2024-09-23 03:07:26,804 INFO [train.py:1198] (3/4) Epoch 10, batch 0, loss[loss=0.2838, ctc_loss=0.2022, cr_loss=0.4078, over 17021.00 frames. ], tot_loss[loss=0.2838, ctc_loss=0.2022, cr_loss=0.4078, over 17021.00 frames. ], batch size: 53, lr: 1.24e-02, grad_scale: 32.0 2024-09-23 03:07:26,805 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 03:07:41,773 INFO [train.py:1230] (3/4) Epoch 10, validation: loss=0.05143, ctc_loss=0.05143, cr_loss=7.705e-15, over 944034.00 frames. 2024-09-23 03:07:41,773 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 03:07:56,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=163678.66666666666, ans=0.1 2024-09-23 03:08:02,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=163678.66666666666, ans=0.125 2024-09-23 03:08:09,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2024-09-23 03:08:14,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=163725.33333333334, ans=0.04949747468305833 2024-09-23 03:08:16,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=22.5 2024-09-23 03:08:18,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=163725.33333333334, ans=0.1 2024-09-23 03:09:05,466 INFO [train.py:1198] (3/4) Epoch 10, batch 50, loss[loss=0.2621, ctc_loss=0.1827, cr_loss=0.3973, over 17094.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1781, cr_loss=0.3903, over 758690.86 frames. ], batch size: 49, lr: 1.24e-02, grad_scale: 32.0 2024-09-23 03:09:05,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=163865.33333333334, ans=0.2 2024-09-23 03:09:13,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=163865.33333333334, ans=0.125 2024-09-23 03:09:13,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=163865.33333333334, ans=0.125 2024-09-23 03:09:16,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=163865.33333333334, ans=0.0 2024-09-23 03:09:21,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=163912.0, ans=0.0 2024-09-23 03:09:29,247 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.322e+02 1.460e+02 1.757e+02 2.503e+02, threshold=2.921e+02, percent-clipped=0.0 2024-09-23 03:09:34,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=163912.0, ans=0.2 2024-09-23 03:09:39,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=163958.66666666666, ans=0.125 2024-09-23 03:09:42,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=12.0 2024-09-23 03:09:51,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=164005.33333333334, ans=0.0 2024-09-23 03:10:13,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=164052.0, ans=0.125 2024-09-23 03:10:24,754 INFO [train.py:1198] (3/4) Epoch 10, batch 100, loss[loss=0.2106, ctc_loss=0.1457, cr_loss=0.3245, over 17169.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1799, cr_loss=0.3898, over 1328251.83 frames. ], batch size: 41, lr: 1.24e-02, grad_scale: 32.0 2024-09-23 03:10:46,357 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.57 vs. limit=10.0 2024-09-23 03:10:51,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=164145.33333333334, ans=0.125 2024-09-23 03:11:03,147 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2024-09-23 03:11:07,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164192.0, ans=0.1 2024-09-23 03:11:21,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=164238.66666666666, ans=0.1 2024-09-23 03:11:46,744 INFO [train.py:1198] (3/4) Epoch 10, batch 150, loss[loss=0.2592, ctc_loss=0.1811, cr_loss=0.3901, over 17061.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1802, cr_loss=0.3893, over 1771405.17 frames. ], batch size: 46, lr: 1.24e-02, grad_scale: 32.0 2024-09-23 03:12:13,436 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.322e+02 1.418e+02 1.649e+02 2.765e+02, threshold=2.835e+02, percent-clipped=0.0 2024-09-23 03:12:16,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=164378.66666666666, ans=0.1 2024-09-23 03:12:16,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=164378.66666666666, ans=0.0 2024-09-23 03:12:28,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=164425.33333333334, ans=0.05 2024-09-23 03:12:34,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=164425.33333333334, ans=0.025 2024-09-23 03:12:37,893 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=12.0 2024-09-23 03:12:46,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=12.0 2024-09-23 03:12:50,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=164472.0, ans=0.0 2024-09-23 03:13:01,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=164518.66666666666, ans=0.125 2024-09-23 03:13:09,148 INFO [train.py:1198] (3/4) Epoch 10, batch 200, loss[loss=0.3096, ctc_loss=0.221, cr_loss=0.443, over 16511.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1797, cr_loss=0.3886, over 2115803.56 frames. ], batch size: 66, lr: 1.24e-02, grad_scale: 32.0 2024-09-23 03:13:12,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=164565.33333333334, ans=0.0 2024-09-23 03:13:27,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=164612.0, ans=0.125 2024-09-23 03:13:32,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=164612.0, ans=0.0 2024-09-23 03:13:39,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=164612.0, ans=0.2 2024-09-23 03:13:51,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=164658.66666666666, ans=0.125 2024-09-23 03:14:00,165 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=15.0 2024-09-23 03:14:02,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=164705.33333333334, ans=0.5 2024-09-23 03:14:02,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=164705.33333333334, ans=0.125 2024-09-23 03:14:24,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=164752.0, ans=0.0 2024-09-23 03:14:33,532 INFO [train.py:1198] (3/4) Epoch 10, batch 250, loss[loss=0.25, ctc_loss=0.171, cr_loss=0.3947, over 17053.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1786, cr_loss=0.3872, over 2401619.07 frames. ], batch size: 46, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:14:33,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=164798.66666666666, ans=0.0 2024-09-23 03:14:43,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.25 vs. limit=22.5 2024-09-23 03:14:49,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=164845.33333333334, ans=0.0 2024-09-23 03:14:57,019 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.299e+02 1.403e+02 1.575e+02 2.434e+02, threshold=2.805e+02, percent-clipped=0.0 2024-09-23 03:15:05,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=164892.0, ans=0.125 2024-09-23 03:15:10,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2024-09-23 03:15:17,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=164892.0, ans=0.125 2024-09-23 03:15:44,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=164985.33333333334, ans=0.0 2024-09-23 03:15:55,552 INFO [train.py:1198] (3/4) Epoch 10, batch 300, loss[loss=0.2239, ctc_loss=0.1547, cr_loss=0.3456, over 17279.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1787, cr_loss=0.3868, over 2608782.66 frames. ], batch size: 42, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:15:57,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=165032.0, ans=0.125 2024-09-23 03:16:00,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=165032.0, ans=0.1 2024-09-23 03:16:07,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2024-09-23 03:16:14,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=165078.66666666666, ans=0.05 2024-09-23 03:16:22,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=165078.66666666666, ans=0.125 2024-09-23 03:16:26,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.44 vs. limit=15.0 2024-09-23 03:16:58,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=165218.66666666666, ans=0.0 2024-09-23 03:17:05,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=165218.66666666666, ans=0.125 2024-09-23 03:17:16,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=165265.33333333334, ans=0.125 2024-09-23 03:17:17,895 INFO [train.py:1198] (3/4) Epoch 10, batch 350, loss[loss=0.23, ctc_loss=0.1593, cr_loss=0.3536, over 17360.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1799, cr_loss=0.387, over 2754947.05 frames. ], batch size: 48, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:17:23,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=165265.33333333334, ans=0.125 2024-09-23 03:17:42,143 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.372e+02 1.514e+02 1.677e+02 2.269e+02, threshold=3.028e+02, percent-clipped=0.0 2024-09-23 03:17:50,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-09-23 03:17:59,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=165358.66666666666, ans=0.0 2024-09-23 03:18:04,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=165405.33333333334, ans=0.95 2024-09-23 03:18:43,378 INFO [train.py:1198] (3/4) Epoch 10, batch 400, loss[loss=0.2375, ctc_loss=0.1626, cr_loss=0.3746, over 17154.00 frames. ], tot_loss[loss=0.2576, ctc_loss=0.1802, cr_loss=0.3871, over 2881195.36 frames. ], batch size: 45, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:19:10,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=165545.33333333334, ans=0.1 2024-09-23 03:19:45,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=165685.33333333334, ans=0.2 2024-09-23 03:19:57,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=165685.33333333334, ans=0.1 2024-09-23 03:19:59,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=15.0 2024-09-23 03:20:03,262 INFO [train.py:1198] (3/4) Epoch 10, batch 450, loss[loss=0.2376, ctc_loss=0.1634, cr_loss=0.3708, over 17294.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.179, cr_loss=0.3861, over 2992556.37 frames. ], batch size: 49, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:20:05,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=165732.0, ans=0.1 2024-09-23 03:20:18,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=165778.66666666666, ans=0.2 2024-09-23 03:20:26,970 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.328e+02 1.495e+02 1.704e+02 3.618e+02, threshold=2.990e+02, percent-clipped=1.0 2024-09-23 03:20:28,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=165778.66666666666, ans=0.125 2024-09-23 03:20:36,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=165825.33333333334, ans=0.125 2024-09-23 03:20:41,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=165825.33333333334, ans=0.125 2024-09-23 03:21:25,284 INFO [train.py:1198] (3/4) Epoch 10, batch 500, loss[loss=0.2417, ctc_loss=0.1686, cr_loss=0.3653, over 17012.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1781, cr_loss=0.3848, over 3075884.66 frames. ], batch size: 44, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:22:11,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=166058.66666666666, ans=0.125 2024-09-23 03:22:24,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.53 vs. limit=15.0 2024-09-23 03:22:30,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=166152.0, ans=0.125 2024-09-23 03:22:40,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=166152.0, ans=0.125 2024-09-23 03:22:41,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=166152.0, ans=0.0 2024-09-23 03:22:45,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-23 03:22:47,903 INFO [train.py:1198] (3/4) Epoch 10, batch 550, loss[loss=0.2846, ctc_loss=0.203, cr_loss=0.4081, over 17060.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1764, cr_loss=0.3832, over 3145344.25 frames. ], batch size: 52, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:23:11,791 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.308e+02 1.376e+02 1.532e+02 2.311e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-23 03:23:57,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=166385.33333333334, ans=0.0 2024-09-23 03:24:13,176 INFO [train.py:1198] (3/4) Epoch 10, batch 600, loss[loss=0.2686, ctc_loss=0.1871, cr_loss=0.4076, over 17023.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1767, cr_loss=0.3838, over 3197902.95 frames. ], batch size: 53, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:24:27,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=166478.66666666666, ans=0.125 2024-09-23 03:24:45,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=166525.33333333334, ans=0.125 2024-09-23 03:24:51,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=166525.33333333334, ans=0.125 2024-09-23 03:25:20,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=166618.66666666666, ans=0.05 2024-09-23 03:25:24,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=22.5 2024-09-23 03:25:26,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=166618.66666666666, ans=0.2 2024-09-23 03:25:32,730 INFO [train.py:1198] (3/4) Epoch 10, batch 650, loss[loss=0.258, ctc_loss=0.1792, cr_loss=0.3937, over 17007.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1772, cr_loss=0.3856, over 3234817.22 frames. ], batch size: 53, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:25:47,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=166665.33333333334, ans=0.2 2024-09-23 03:25:59,418 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.346e+02 1.493e+02 1.797e+02 2.927e+02, threshold=2.987e+02, percent-clipped=1.0 2024-09-23 03:26:36,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=166805.33333333334, ans=0.125 2024-09-23 03:26:51,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=166852.0, ans=0.125 2024-09-23 03:26:54,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=166898.66666666666, ans=0.125 2024-09-23 03:26:55,507 INFO [train.py:1198] (3/4) Epoch 10, batch 700, loss[loss=0.2117, ctc_loss=0.1454, cr_loss=0.3314, over 17286.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1769, cr_loss=0.3852, over 3255875.72 frames. ], batch size: 42, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:26:57,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=166898.66666666666, ans=0.125 2024-09-23 03:27:12,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=166945.33333333334, ans=0.0 2024-09-23 03:27:35,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=166992.0, ans=0.09899494936611666 2024-09-23 03:27:56,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=167038.66666666666, ans=0.025 2024-09-23 03:28:00,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=12.0 2024-09-23 03:28:01,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167085.33333333334, ans=0.1 2024-09-23 03:28:07,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=167085.33333333334, ans=0.0 2024-09-23 03:28:20,940 INFO [train.py:1198] (3/4) Epoch 10, batch 750, loss[loss=0.2136, ctc_loss=0.1485, cr_loss=0.3256, over 16938.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1775, cr_loss=0.3869, over 3284840.52 frames. ], batch size: 42, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:28:47,712 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.321e+02 1.498e+02 1.811e+02 2.765e+02, threshold=2.996e+02, percent-clipped=0.0 2024-09-23 03:29:00,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=167225.33333333334, ans=0.0 2024-09-23 03:29:05,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=167225.33333333334, ans=0.125 2024-09-23 03:29:38,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2024-09-23 03:29:43,241 INFO [train.py:1198] (3/4) Epoch 10, batch 800, loss[loss=0.2376, ctc_loss=0.1614, cr_loss=0.381, over 17150.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1756, cr_loss=0.3839, over 3312409.05 frames. ], batch size: 45, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:29:48,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=167365.33333333334, ans=0.125 2024-09-23 03:30:01,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=167412.0, ans=0.125 2024-09-23 03:30:52,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167552.0, ans=0.1 2024-09-23 03:31:02,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167552.0, ans=0.1 2024-09-23 03:31:05,264 INFO [train.py:1198] (3/4) Epoch 10, batch 850, loss[loss=0.2283, ctc_loss=0.1614, cr_loss=0.3343, over 17091.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1755, cr_loss=0.3839, over 3329674.64 frames. ], batch size: 43, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:31:11,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=167598.66666666666, ans=0.1 2024-09-23 03:31:29,265 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.338e+02 1.474e+02 1.669e+02 2.399e+02, threshold=2.948e+02, percent-clipped=0.0 2024-09-23 03:31:35,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=167692.0, ans=0.0 2024-09-23 03:31:35,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=167692.0, ans=0.0 2024-09-23 03:31:39,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=167692.0, ans=0.0 2024-09-23 03:32:01,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=167738.66666666666, ans=0.125 2024-09-23 03:32:20,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=167785.33333333334, ans=0.0 2024-09-23 03:32:24,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=167785.33333333334, ans=0.125 2024-09-23 03:32:27,742 INFO [train.py:1198] (3/4) Epoch 10, batch 900, loss[loss=0.2791, ctc_loss=0.1952, cr_loss=0.4197, over 17040.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1754, cr_loss=0.3831, over 3338880.28 frames. ], batch size: 52, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:32:31,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=167832.0, ans=0.04949747468305833 2024-09-23 03:32:32,841 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:32:36,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=167832.0, ans=0.0 2024-09-23 03:33:26,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=167972.0, ans=0.2 2024-09-23 03:33:44,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=168018.66666666666, ans=0.0 2024-09-23 03:33:55,327 INFO [train.py:1198] (3/4) Epoch 10, batch 950, loss[loss=0.2754, ctc_loss=0.1969, cr_loss=0.3926, over 17211.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1757, cr_loss=0.3836, over 3342332.43 frames. ], batch size: 55, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:33:55,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=168065.33333333334, ans=0.125 2024-09-23 03:34:06,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=168065.33333333334, ans=0.0 2024-09-23 03:34:18,980 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.276e+02 1.450e+02 1.704e+02 3.070e+02, threshold=2.900e+02, percent-clipped=1.0 2024-09-23 03:34:25,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=22.5 2024-09-23 03:34:43,103 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:35:02,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=168252.0, ans=0.2 2024-09-23 03:35:10,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.17 vs. limit=15.0 2024-09-23 03:35:11,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=168252.0, ans=0.0 2024-09-23 03:35:14,463 INFO [train.py:1198] (3/4) Epoch 10, batch 1000, loss[loss=0.2378, ctc_loss=0.1641, cr_loss=0.3685, over 17029.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1766, cr_loss=0.3855, over 3348150.83 frames. ], batch size: 44, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:35:55,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=168392.0, ans=0.0 2024-09-23 03:35:57,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=168392.0, ans=0.125 2024-09-23 03:36:30,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=168485.33333333334, ans=0.1 2024-09-23 03:36:36,365 INFO [train.py:1198] (3/4) Epoch 10, batch 1050, loss[loss=0.2677, ctc_loss=0.1838, cr_loss=0.4196, over 16723.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1751, cr_loss=0.3835, over 3353728.80 frames. ], batch size: 61, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:36:43,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=168532.0, ans=0.0 2024-09-23 03:37:00,706 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.301e+02 1.448e+02 1.658e+02 2.854e+02, threshold=2.897e+02, percent-clipped=0.0 2024-09-23 03:37:05,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=168578.66666666666, ans=0.0 2024-09-23 03:37:33,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=168672.0, ans=0.0 2024-09-23 03:37:58,858 INFO [train.py:1198] (3/4) Epoch 10, batch 1100, loss[loss=0.2839, ctc_loss=0.1989, cr_loss=0.4254, over 17010.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1761, cr_loss=0.3852, over 3352661.56 frames. ], batch size: 51, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:37:59,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=168765.33333333334, ans=0.07 2024-09-23 03:38:19,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2024-09-23 03:38:22,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2024-09-23 03:38:44,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=168858.66666666666, ans=0.125 2024-09-23 03:38:51,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=168905.33333333334, ans=0.0 2024-09-23 03:38:52,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=168905.33333333334, ans=0.125 2024-09-23 03:39:15,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=168952.0, ans=0.125 2024-09-23 03:39:23,801 INFO [train.py:1198] (3/4) Epoch 10, batch 1150, loss[loss=0.2769, ctc_loss=0.1957, cr_loss=0.4055, over 16980.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.176, cr_loss=0.3854, over 3362174.46 frames. ], batch size: 56, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:39:41,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=169045.33333333334, ans=0.0 2024-09-23 03:39:47,599 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.373e+02 1.514e+02 1.720e+02 2.403e+02, threshold=3.028e+02, percent-clipped=0.0 2024-09-23 03:39:52,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=169045.33333333334, ans=0.125 2024-09-23 03:40:16,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=169138.66666666666, ans=0.1 2024-09-23 03:40:31,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=169185.33333333334, ans=0.0 2024-09-23 03:40:38,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=169185.33333333334, ans=0.0 2024-09-23 03:40:45,949 INFO [train.py:1198] (3/4) Epoch 10, batch 1200, loss[loss=0.2211, ctc_loss=0.152, cr_loss=0.3452, over 17309.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.176, cr_loss=0.3848, over 3351751.63 frames. ], batch size: 46, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:40:51,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=169232.0, ans=0.0 2024-09-23 03:41:06,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=169278.66666666666, ans=0.125 2024-09-23 03:41:24,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=169325.33333333334, ans=0.2 2024-09-23 03:41:33,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.80 vs. limit=6.0 2024-09-23 03:41:37,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=169372.0, ans=0.05 2024-09-23 03:42:07,889 INFO [train.py:1198] (3/4) Epoch 10, batch 1250, loss[loss=0.3029, ctc_loss=0.2136, cr_loss=0.4461, over 17016.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1759, cr_loss=0.3845, over 3356919.40 frames. ], batch size: 51, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:42:31,957 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.271e+02 1.375e+02 1.546e+02 2.384e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-23 03:42:48,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=169558.66666666666, ans=0.0 2024-09-23 03:43:00,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=169605.33333333334, ans=0.0 2024-09-23 03:43:13,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=169652.0, ans=0.0 2024-09-23 03:43:13,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=22.5 2024-09-23 03:43:32,878 INFO [train.py:1198] (3/4) Epoch 10, batch 1300, loss[loss=0.2059, ctc_loss=0.1429, cr_loss=0.3148, over 17026.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1767, cr_loss=0.385, over 3354188.40 frames. ], batch size: 39, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:43:33,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.43 vs. limit=15.0 2024-09-23 03:43:34,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=169698.66666666666, ans=10.0 2024-09-23 03:43:52,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=169745.33333333334, ans=0.2 2024-09-23 03:43:53,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=169745.33333333334, ans=0.125 2024-09-23 03:44:00,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=169745.33333333334, ans=0.125 2024-09-23 03:44:19,302 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:44:33,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=169838.66666666666, ans=0.125 2024-09-23 03:44:38,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=169885.33333333334, ans=0.2 2024-09-23 03:44:52,273 INFO [train.py:1198] (3/4) Epoch 10, batch 1350, loss[loss=0.2872, ctc_loss=0.2008, cr_loss=0.4318, over 17004.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1754, cr_loss=0.3833, over 3363219.52 frames. ], batch size: 53, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:44:57,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=169932.0, ans=0.09899494936611666 2024-09-23 03:45:11,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=169978.66666666666, ans=0.04949747468305833 2024-09-23 03:45:13,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.99 vs. limit=10.0 2024-09-23 03:45:16,273 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.346e+02 1.479e+02 1.695e+02 2.554e+02, threshold=2.958e+02, percent-clipped=0.0 2024-09-23 03:45:21,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=169978.66666666666, ans=0.2 2024-09-23 03:45:21,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=169978.66666666666, ans=0.125 2024-09-23 03:45:29,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-23 03:45:40,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=170025.33333333334, ans=0.125 2024-09-23 03:45:43,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2024-09-23 03:45:51,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=170072.0, ans=0.025 2024-09-23 03:46:00,585 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:46:11,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=170118.66666666666, ans=0.2 2024-09-23 03:46:14,378 INFO [train.py:1198] (3/4) Epoch 10, batch 1400, loss[loss=0.2491, ctc_loss=0.1721, cr_loss=0.385, over 17096.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1757, cr_loss=0.3841, over 3365466.64 frames. ], batch size: 43, lr: 1.22e-02, grad_scale: 16.0 2024-09-23 03:46:30,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=170212.0, ans=0.125 2024-09-23 03:47:06,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=170305.33333333334, ans=0.125 2024-09-23 03:47:22,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=170352.0, ans=0.5 2024-09-23 03:47:28,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=170352.0, ans=0.0 2024-09-23 03:47:36,190 INFO [train.py:1198] (3/4) Epoch 10, batch 1450, loss[loss=0.2341, ctc_loss=0.1577, cr_loss=0.3819, over 16952.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1753, cr_loss=0.3837, over 3369332.54 frames. ], batch size: 42, lr: 1.22e-02, grad_scale: 16.0 2024-09-23 03:47:41,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2024-09-23 03:47:49,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=170398.66666666666, ans=0.2 2024-09-23 03:48:04,179 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.358e+02 1.487e+02 1.729e+02 2.482e+02, threshold=2.974e+02, percent-clipped=0.0 2024-09-23 03:48:16,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.60 vs. limit=15.0 2024-09-23 03:48:35,701 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2024-09-23 03:48:41,138 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-09-23 03:48:45,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=170585.33333333334, ans=0.0 2024-09-23 03:48:56,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=170585.33333333334, ans=0.125 2024-09-23 03:48:59,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=170632.0, ans=0.0 2024-09-23 03:49:00,805 INFO [train.py:1198] (3/4) Epoch 10, batch 1500, loss[loss=0.2633, ctc_loss=0.1819, cr_loss=0.4071, over 17221.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.176, cr_loss=0.3845, over 3369499.49 frames. ], batch size: 47, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:49:10,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170632.0, ans=0.1 2024-09-23 03:49:31,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=170725.33333333334, ans=0.2 2024-09-23 03:49:48,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=170772.0, ans=0.0 2024-09-23 03:49:58,711 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:50:22,864 INFO [train.py:1198] (3/4) Epoch 10, batch 1550, loss[loss=0.3085, ctc_loss=0.2301, cr_loss=0.3917, over 12064.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1758, cr_loss=0.3836, over 3358794.01 frames. ], batch size: 123, lr: 1.21e-02, grad_scale: 8.0 2024-09-23 03:50:49,523 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.376e+02 1.480e+02 1.669e+02 2.824e+02, threshold=2.960e+02, percent-clipped=0.0 2024-09-23 03:51:15,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=171005.33333333334, ans=0.0 2024-09-23 03:51:34,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=171052.0, ans=0.1 2024-09-23 03:51:41,966 INFO [train.py:1198] (3/4) Epoch 10, batch 1600, loss[loss=0.208, ctc_loss=0.1405, cr_loss=0.3374, over 16999.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1753, cr_loss=0.3826, over 3361725.29 frames. ], batch size: 39, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:51:42,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=171098.66666666666, ans=0.125 2024-09-23 03:51:55,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=171098.66666666666, ans=0.125 2024-09-23 03:51:55,903 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2024-09-23 03:52:01,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.14 vs. limit=10.0 2024-09-23 03:52:26,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2024-09-23 03:52:27,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=171192.0, ans=0.125 2024-09-23 03:52:45,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=171238.66666666666, ans=0.2 2024-09-23 03:52:48,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=171285.33333333334, ans=0.125 2024-09-23 03:53:02,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=12.0 2024-09-23 03:53:06,736 INFO [train.py:1198] (3/4) Epoch 10, batch 1650, loss[loss=0.2462, ctc_loss=0.1667, cr_loss=0.3974, over 17030.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1757, cr_loss=0.3843, over 3369334.47 frames. ], batch size: 44, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:53:22,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=171332.0, ans=0.1 2024-09-23 03:53:27,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=171378.66666666666, ans=0.125 2024-09-23 03:53:36,441 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.322e+02 1.432e+02 1.629e+02 2.855e+02, threshold=2.864e+02, percent-clipped=0.0 2024-09-23 03:54:02,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=171472.0, ans=0.125 2024-09-23 03:54:16,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=171518.66666666666, ans=0.0 2024-09-23 03:54:26,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=171518.66666666666, ans=0.125 2024-09-23 03:54:26,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=171518.66666666666, ans=0.2 2024-09-23 03:54:29,207 INFO [train.py:1198] (3/4) Epoch 10, batch 1700, loss[loss=0.2738, ctc_loss=0.1914, cr_loss=0.412, over 15910.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1755, cr_loss=0.3835, over 3364380.39 frames. ], batch size: 74, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:54:43,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=171612.0, ans=0.07 2024-09-23 03:55:02,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=171658.66666666666, ans=0.2 2024-09-23 03:55:22,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=171705.33333333334, ans=0.0 2024-09-23 03:55:36,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2024-09-23 03:55:49,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=171798.66666666666, ans=0.125 2024-09-23 03:55:51,291 INFO [train.py:1198] (3/4) Epoch 10, batch 1750, loss[loss=0.2629, ctc_loss=0.1828, cr_loss=0.4003, over 17021.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1757, cr_loss=0.3836, over 3364356.09 frames. ], batch size: 44, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:56:09,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=171845.33333333334, ans=0.125 2024-09-23 03:56:18,404 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.367e+02 1.590e+02 1.877e+02 2.998e+02, threshold=3.180e+02, percent-clipped=1.0 2024-09-23 03:56:23,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=171892.0, ans=0.125 2024-09-23 03:56:31,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=171892.0, ans=0.0 2024-09-23 03:56:40,979 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:57:06,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.91 vs. limit=12.0 2024-09-23 03:57:13,329 INFO [train.py:1198] (3/4) Epoch 10, batch 1800, loss[loss=0.2222, ctc_loss=0.1533, cr_loss=0.3443, over 17049.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1752, cr_loss=0.3839, over 3373268.69 frames. ], batch size: 39, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:57:28,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=172078.66666666666, ans=0.2 2024-09-23 03:57:28,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=172078.66666666666, ans=0.0 2024-09-23 03:57:39,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=172078.66666666666, ans=0.125 2024-09-23 03:57:48,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.86 vs. limit=12.0 2024-09-23 03:58:00,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172125.33333333334, ans=0.1 2024-09-23 03:58:12,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=172172.0, ans=0.125 2024-09-23 03:58:25,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=172218.66666666666, ans=0.125 2024-09-23 03:58:30,366 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.26 vs. limit=22.5 2024-09-23 03:58:37,574 INFO [train.py:1198] (3/4) Epoch 10, batch 1850, loss[loss=0.2284, ctc_loss=0.152, cr_loss=0.3819, over 16949.00 frames. ], tot_loss[loss=0.2519, ctc_loss=0.1751, cr_loss=0.3836, over 3365805.52 frames. ], batch size: 42, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:58:42,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=172265.33333333334, ans=0.125 2024-09-23 03:58:45,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=172265.33333333334, ans=0.0 2024-09-23 03:58:45,693 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:59:03,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=172312.0, ans=0.2 2024-09-23 03:59:04,289 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.347e+02 1.457e+02 1.655e+02 2.749e+02, threshold=2.915e+02, percent-clipped=0.0 2024-09-23 03:59:12,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=172358.66666666666, ans=0.0 2024-09-23 03:59:14,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-09-23 03:59:21,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=12.0 2024-09-23 03:59:25,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=172405.33333333334, ans=0.2 2024-09-23 03:59:28,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=172405.33333333334, ans=10.0 2024-09-23 03:59:44,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=172452.0, ans=0.0 2024-09-23 03:59:57,223 INFO [train.py:1198] (3/4) Epoch 10, batch 1900, loss[loss=0.277, ctc_loss=0.1948, cr_loss=0.4113, over 17290.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1749, cr_loss=0.3835, over 3366521.11 frames. ], batch size: 49, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 04:00:30,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=172592.0, ans=0.0 2024-09-23 04:00:43,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=172592.0, ans=0.125 2024-09-23 04:00:43,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.95 vs. limit=10.0 2024-09-23 04:00:46,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=172638.66666666666, ans=0.0 2024-09-23 04:00:47,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=172638.66666666666, ans=0.025 2024-09-23 04:00:55,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=172638.66666666666, ans=0.125 2024-09-23 04:01:19,647 INFO [train.py:1198] (3/4) Epoch 10, batch 1950, loss[loss=0.2, ctc_loss=0.1343, cr_loss=0.3282, over 16351.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1754, cr_loss=0.3845, over 3363816.39 frames. ], batch size: 36, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 04:01:48,822 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.405e+02 1.574e+02 1.755e+02 2.409e+02, threshold=3.148e+02, percent-clipped=0.0 2024-09-23 04:01:52,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=172825.33333333334, ans=0.0 2024-09-23 04:02:14,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=172872.0, ans=0.125 2024-09-23 04:02:43,968 INFO [train.py:1198] (3/4) Epoch 10, batch 2000, loss[loss=0.2625, ctc_loss=0.1816, cr_loss=0.4048, over 16933.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1752, cr_loss=0.3842, over 3364833.25 frames. ], batch size: 58, lr: 1.21e-02, grad_scale: 32.0 2024-09-23 04:02:49,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=172965.33333333334, ans=0.125 2024-09-23 04:02:53,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=172965.33333333334, ans=0.125 2024-09-23 04:02:58,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=173012.0, ans=0.0 2024-09-23 04:03:15,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=173012.0, ans=0.0 2024-09-23 04:03:26,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=173058.66666666666, ans=0.2 2024-09-23 04:03:29,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=173058.66666666666, ans=0.2 2024-09-23 04:03:41,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=173105.33333333334, ans=0.0 2024-09-23 04:03:56,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=173152.0, ans=0.04949747468305833 2024-09-23 04:04:06,415 INFO [train.py:1198] (3/4) Epoch 10, batch 2050, loss[loss=0.203, ctc_loss=0.1363, cr_loss=0.3335, over 17097.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1755, cr_loss=0.3846, over 3359419.09 frames. ], batch size: 43, lr: 1.21e-02, grad_scale: 32.0 2024-09-23 04:04:08,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=173198.66666666666, ans=0.125 2024-09-23 04:04:22,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=173245.33333333334, ans=0.125 2024-09-23 04:04:22,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=173245.33333333334, ans=0.0 2024-09-23 04:04:34,983 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.316e+02 1.437e+02 1.596e+02 3.726e+02, threshold=2.874e+02, percent-clipped=1.0 2024-09-23 04:05:14,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=173385.33333333334, ans=0.125 2024-09-23 04:05:18,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2024-09-23 04:05:28,498 INFO [train.py:1198] (3/4) Epoch 10, batch 2100, loss[loss=0.3106, ctc_loss=0.2216, cr_loss=0.4448, over 14933.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1758, cr_loss=0.3842, over 3351043.49 frames. ], batch size: 89, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:05:43,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.09 vs. limit=15.0 2024-09-23 04:05:44,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=173478.66666666666, ans=0.125 2024-09-23 04:05:55,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=173478.66666666666, ans=0.2 2024-09-23 04:06:07,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=173525.33333333334, ans=0.0 2024-09-23 04:06:12,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.98 vs. limit=12.0 2024-09-23 04:06:17,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=12.0 2024-09-23 04:06:28,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=173572.0, ans=0.125 2024-09-23 04:06:49,963 INFO [train.py:1198] (3/4) Epoch 10, batch 2150, loss[loss=0.2606, ctc_loss=0.1823, cr_loss=0.3915, over 17307.00 frames. ], tot_loss[loss=0.2519, ctc_loss=0.1751, cr_loss=0.3839, over 3359992.63 frames. ], batch size: 51, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:06:59,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=173665.33333333334, ans=0.015 2024-09-23 04:07:18,784 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.317e+02 1.458e+02 1.666e+02 3.135e+02, threshold=2.917e+02, percent-clipped=1.0 2024-09-23 04:07:22,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=173758.66666666666, ans=0.1 2024-09-23 04:08:02,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=173852.0, ans=0.125 2024-09-23 04:08:05,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=173852.0, ans=0.2 2024-09-23 04:08:13,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=173898.66666666666, ans=0.125 2024-09-23 04:08:14,680 INFO [train.py:1198] (3/4) Epoch 10, batch 2200, loss[loss=0.2311, ctc_loss=0.1616, cr_loss=0.3471, over 17019.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1742, cr_loss=0.3828, over 3361060.84 frames. ], batch size: 51, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:09:34,450 INFO [train.py:1198] (3/4) Epoch 10, batch 2250, loss[loss=0.1996, ctc_loss=0.1377, cr_loss=0.3093, over 17238.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1745, cr_loss=0.3832, over 3365142.55 frames. ], batch size: 42, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:10:05,826 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.382e+02 1.475e+02 1.727e+02 2.787e+02, threshold=2.949e+02, percent-clipped=0.0 2024-09-23 04:10:13,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.54 vs. limit=15.0 2024-09-23 04:10:20,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=174225.33333333334, ans=0.2 2024-09-23 04:10:22,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=174225.33333333334, ans=0.0 2024-09-23 04:10:36,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=174272.0, ans=0.125 2024-09-23 04:10:37,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=22.5 2024-09-23 04:10:39,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=174318.66666666666, ans=0.125 2024-09-23 04:10:49,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=174318.66666666666, ans=0.0 2024-09-23 04:10:49,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=174318.66666666666, ans=0.1 2024-09-23 04:10:56,695 INFO [train.py:1198] (3/4) Epoch 10, batch 2300, loss[loss=0.3002, ctc_loss=0.2147, cr_loss=0.4274, over 17244.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1742, cr_loss=0.3832, over 3365185.82 frames. ], batch size: 50, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:10:57,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=174365.33333333334, ans=0.05 2024-09-23 04:11:06,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=174365.33333333334, ans=0.125 2024-09-23 04:11:28,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=174458.66666666666, ans=0.0 2024-09-23 04:11:42,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=174458.66666666666, ans=0.0 2024-09-23 04:11:50,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=174505.33333333334, ans=0.125 2024-09-23 04:11:55,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=174505.33333333334, ans=0.2 2024-09-23 04:12:19,124 INFO [train.py:1198] (3/4) Epoch 10, batch 2350, loss[loss=0.2295, ctc_loss=0.1553, cr_loss=0.3712, over 17087.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1743, cr_loss=0.3835, over 3364313.31 frames. ], batch size: 43, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:12:24,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-09-23 04:12:40,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=174645.33333333334, ans=0.125 2024-09-23 04:12:50,300 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.308e+02 1.414e+02 1.626e+02 2.428e+02, threshold=2.828e+02, percent-clipped=0.0 2024-09-23 04:12:58,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=174692.0, ans=0.09899494936611666 2024-09-23 04:13:07,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=174692.0, ans=0.2 2024-09-23 04:13:11,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2024-09-23 04:13:43,457 INFO [train.py:1198] (3/4) Epoch 10, batch 2400, loss[loss=0.2187, ctc_loss=0.153, cr_loss=0.3287, over 17116.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1751, cr_loss=0.3849, over 3370983.25 frames. ], batch size: 40, lr: 1.20e-02, grad_scale: 32.0 2024-09-23 04:13:58,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=174878.66666666666, ans=0.025 2024-09-23 04:14:04,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=174878.66666666666, ans=0.125 2024-09-23 04:14:11,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=174878.66666666666, ans=0.0 2024-09-23 04:14:22,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=174925.33333333334, ans=0.025 2024-09-23 04:14:44,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=174972.0, ans=0.125 2024-09-23 04:15:06,055 INFO [train.py:1198] (3/4) Epoch 10, batch 2450, loss[loss=0.2239, ctc_loss=0.1536, cr_loss=0.3512, over 17049.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1734, cr_loss=0.382, over 3369349.05 frames. ], batch size: 44, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:15:19,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=175065.33333333334, ans=0.125 2024-09-23 04:15:22,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=175112.0, ans=0.125 2024-09-23 04:15:31,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=175112.0, ans=0.0 2024-09-23 04:15:33,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.11 vs. limit=10.0 2024-09-23 04:15:36,388 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.295e+02 1.392e+02 1.667e+02 2.800e+02, threshold=2.783e+02, percent-clipped=0.0 2024-09-23 04:15:43,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.15 vs. limit=12.0 2024-09-23 04:15:44,657 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 04:15:44,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=175158.66666666666, ans=0.04949747468305833 2024-09-23 04:15:51,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=175158.66666666666, ans=0.0 2024-09-23 04:16:12,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=175252.0, ans=0.125 2024-09-23 04:16:25,985 INFO [train.py:1198] (3/4) Epoch 10, batch 2500, loss[loss=0.2384, ctc_loss=0.1646, cr_loss=0.369, over 17222.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1744, cr_loss=0.3836, over 3369220.81 frames. ], batch size: 47, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:16:38,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=175298.66666666666, ans=0.1 2024-09-23 04:17:10,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=175392.0, ans=0.0 2024-09-23 04:17:33,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=175485.33333333334, ans=0.0 2024-09-23 04:17:37,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2024-09-23 04:17:50,879 INFO [train.py:1198] (3/4) Epoch 10, batch 2550, loss[loss=0.2536, ctc_loss=0.1734, cr_loss=0.401, over 17124.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1744, cr_loss=0.3826, over 3360582.41 frames. ], batch size: 49, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:18:01,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175532.0, ans=0.1 2024-09-23 04:18:23,452 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.321e+02 1.441e+02 1.691e+02 2.698e+02, threshold=2.882e+02, percent-clipped=0.0 2024-09-23 04:18:25,319 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 04:18:41,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=175672.0, ans=0.0 2024-09-23 04:18:47,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2024-09-23 04:18:52,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=175672.0, ans=0.1 2024-09-23 04:19:12,532 INFO [train.py:1198] (3/4) Epoch 10, batch 2600, loss[loss=0.2448, ctc_loss=0.1679, cr_loss=0.3846, over 17290.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1755, cr_loss=0.385, over 3355634.71 frames. ], batch size: 51, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:19:24,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=175765.33333333334, ans=0.05 2024-09-23 04:19:48,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=175858.66666666666, ans=0.0 2024-09-23 04:20:00,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=175858.66666666666, ans=0.125 2024-09-23 04:20:16,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2024-09-23 04:20:17,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.42 vs. limit=5.0 2024-09-23 04:20:23,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=175952.0, ans=0.125 2024-09-23 04:20:28,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=175952.0, ans=0.125 2024-09-23 04:20:34,898 INFO [train.py:1198] (3/4) Epoch 10, batch 2650, loss[loss=0.2389, ctc_loss=0.1619, cr_loss=0.3849, over 17243.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1755, cr_loss=0.3848, over 3352819.35 frames. ], batch size: 44, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:20:38,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.60 vs. limit=15.0 2024-09-23 04:21:05,299 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.318e+02 1.384e+02 1.550e+02 2.652e+02, threshold=2.768e+02, percent-clipped=0.0 2024-09-23 04:21:23,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=176138.66666666666, ans=0.125 2024-09-23 04:21:23,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=176138.66666666666, ans=0.025 2024-09-23 04:21:26,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176138.66666666666, ans=0.1 2024-09-23 04:21:57,391 INFO [train.py:1198] (3/4) Epoch 10, batch 2700, loss[loss=0.2339, ctc_loss=0.1608, cr_loss=0.3657, over 17307.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1743, cr_loss=0.3834, over 3363035.20 frames. ], batch size: 49, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:21:57,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=176232.0, ans=0.2 2024-09-23 04:21:58,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-09-23 04:22:10,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=176232.0, ans=0.125 2024-09-23 04:22:35,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=176325.33333333334, ans=0.125 2024-09-23 04:22:40,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=176325.33333333334, ans=0.0 2024-09-23 04:22:43,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=176325.33333333334, ans=0.0 2024-09-23 04:23:22,734 INFO [train.py:1198] (3/4) Epoch 10, batch 2750, loss[loss=0.2428, ctc_loss=0.1723, cr_loss=0.3525, over 16220.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1745, cr_loss=0.3841, over 3358613.91 frames. ], batch size: 36, lr: 1.19e-02, grad_scale: 16.0 2024-09-23 04:23:28,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-09-23 04:23:33,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0 2024-09-23 04:23:45,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=176512.0, ans=0.2 2024-09-23 04:23:53,133 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.357e+02 1.501e+02 1.751e+02 2.551e+02, threshold=3.001e+02, percent-clipped=0.0 2024-09-23 04:24:41,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=12.0 2024-09-23 04:24:42,411 INFO [train.py:1198] (3/4) Epoch 10, batch 2800, loss[loss=0.2558, ctc_loss=0.1776, cr_loss=0.3907, over 15008.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.174, cr_loss=0.3837, over 3361096.26 frames. ], batch size: 89, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:24:48,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=176698.66666666666, ans=10.0 2024-09-23 04:24:50,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=22.5 2024-09-23 04:25:10,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=15.0 2024-09-23 04:25:11,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=12.0 2024-09-23 04:25:24,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.36 vs. limit=15.0 2024-09-23 04:25:32,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2024-09-23 04:25:36,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=176838.66666666666, ans=0.0 2024-09-23 04:25:41,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=176838.66666666666, ans=0.0 2024-09-23 04:25:57,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=176885.33333333334, ans=0.125 2024-09-23 04:26:05,053 INFO [train.py:1198] (3/4) Epoch 10, batch 2850, loss[loss=0.2573, ctc_loss=0.1799, cr_loss=0.387, over 17148.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1748, cr_loss=0.3845, over 3356144.81 frames. ], batch size: 48, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:26:05,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=176932.0, ans=0.0 2024-09-23 04:26:14,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=176932.0, ans=0.125 2024-09-23 04:26:38,103 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.308e+02 1.482e+02 1.673e+02 2.195e+02, threshold=2.964e+02, percent-clipped=0.0 2024-09-23 04:27:05,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=177072.0, ans=0.0 2024-09-23 04:27:12,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=177118.66666666666, ans=0.025 2024-09-23 04:27:30,214 INFO [train.py:1198] (3/4) Epoch 10, batch 2900, loss[loss=0.3036, ctc_loss=0.2212, cr_loss=0.4119, over 11986.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1737, cr_loss=0.3829, over 3352013.13 frames. ], batch size: 123, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:27:47,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=177212.0, ans=0.07 2024-09-23 04:27:48,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=177212.0, ans=0.025 2024-09-23 04:28:17,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=177258.66666666666, ans=0.1 2024-09-23 04:28:18,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2024-09-23 04:28:25,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=177305.33333333334, ans=0.1 2024-09-23 04:28:30,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=177305.33333333334, ans=0.2 2024-09-23 04:28:40,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=177352.0, ans=0.1 2024-09-23 04:28:41,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=177352.0, ans=0.125 2024-09-23 04:28:52,969 INFO [train.py:1198] (3/4) Epoch 10, batch 2950, loss[loss=0.2797, ctc_loss=0.1958, cr_loss=0.4194, over 16658.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1744, cr_loss=0.3834, over 3350783.12 frames. ], batch size: 61, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:29:23,162 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.333e+02 1.430e+02 1.579e+02 2.314e+02, threshold=2.860e+02, percent-clipped=0.0 2024-09-23 04:29:28,276 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 04:29:35,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=177492.0, ans=0.125 2024-09-23 04:29:40,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=177538.66666666666, ans=0.1 2024-09-23 04:29:43,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=177538.66666666666, ans=0.0 2024-09-23 04:30:14,458 INFO [train.py:1198] (3/4) Epoch 10, batch 3000, loss[loss=0.2573, ctc_loss=0.1792, cr_loss=0.3908, over 16953.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1749, cr_loss=0.3837, over 3351891.10 frames. ], batch size: 42, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:30:14,458 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 04:30:30,498 INFO [train.py:1230] (3/4) Epoch 10, validation: loss=0.04843, ctc_loss=0.04843, cr_loss=7.942e-15, over 944034.00 frames. 2024-09-23 04:30:30,499 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 04:30:38,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=12.0 2024-09-23 04:31:06,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=177725.33333333334, ans=0.0 2024-09-23 04:31:14,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=22.5 2024-09-23 04:31:34,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=177818.66666666666, ans=0.2 2024-09-23 04:31:48,172 INFO [train.py:1198] (3/4) Epoch 10, batch 3050, loss[loss=0.2551, ctc_loss=0.182, cr_loss=0.3659, over 16399.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1752, cr_loss=0.384, over 3339577.90 frames. ], batch size: 66, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:31:48,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=177865.33333333334, ans=0.125 2024-09-23 04:31:53,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=177865.33333333334, ans=0.0 2024-09-23 04:32:05,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=177912.0, ans=0.1 2024-09-23 04:32:06,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.52 vs. limit=15.0 2024-09-23 04:32:10,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.03 vs. limit=15.0 2024-09-23 04:32:13,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=177912.0, ans=0.0 2024-09-23 04:32:17,671 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.279e+02 1.408e+02 1.568e+02 2.746e+02, threshold=2.816e+02, percent-clipped=0.0 2024-09-23 04:32:19,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=177958.66666666666, ans=0.1 2024-09-23 04:32:43,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=178005.33333333334, ans=0.0 2024-09-23 04:32:49,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=178052.0, ans=0.125 2024-09-23 04:33:06,404 INFO [train.py:1198] (3/4) Epoch 10, batch 3100, loss[loss=0.2591, ctc_loss=0.1766, cr_loss=0.4124, over 17292.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1741, cr_loss=0.3832, over 3346885.36 frames. ], batch size: 51, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:33:08,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=178098.66666666666, ans=0.0 2024-09-23 04:33:34,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=178145.33333333334, ans=0.0 2024-09-23 04:33:39,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2024-09-23 04:33:57,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=178238.66666666666, ans=0.125 2024-09-23 04:34:27,443 INFO [train.py:1198] (3/4) Epoch 10, batch 3150, loss[loss=0.2731, ctc_loss=0.1901, cr_loss=0.4152, over 16030.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.173, cr_loss=0.3815, over 3349697.77 frames. ], batch size: 74, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:34:29,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.15 vs. limit=22.5 2024-09-23 04:34:33,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=178332.0, ans=0.09899494936611666 2024-09-23 04:34:35,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=178332.0, ans=0.2 2024-09-23 04:34:50,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=178378.66666666666, ans=0.125 2024-09-23 04:34:56,601 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.289e+02 1.420e+02 1.589e+02 2.247e+02, threshold=2.840e+02, percent-clipped=0.0 2024-09-23 04:35:31,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.72 vs. limit=15.0 2024-09-23 04:35:49,623 INFO [train.py:1198] (3/4) Epoch 10, batch 3200, loss[loss=0.2613, ctc_loss=0.1812, cr_loss=0.4002, over 17264.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1728, cr_loss=0.3808, over 3348280.40 frames. ], batch size: 44, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:36:10,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=178612.0, ans=0.2 2024-09-23 04:36:28,890 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 04:36:55,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=178752.0, ans=0.125 2024-09-23 04:37:01,718 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.17 vs. limit=22.5 2024-09-23 04:37:07,424 INFO [train.py:1198] (3/4) Epoch 10, batch 3250, loss[loss=0.2871, ctc_loss=0.2008, cr_loss=0.4317, over 16532.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1727, cr_loss=0.3809, over 3358059.09 frames. ], batch size: 66, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:37:07,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=178798.66666666666, ans=0.125 2024-09-23 04:37:12,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=178798.66666666666, ans=0.025 2024-09-23 04:37:31,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=178845.33333333334, ans=0.125 2024-09-23 04:37:35,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=178845.33333333334, ans=0.125 2024-09-23 04:37:37,059 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.295e+02 1.395e+02 1.531e+02 3.942e+02, threshold=2.791e+02, percent-clipped=1.0 2024-09-23 04:37:55,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=178938.66666666666, ans=0.1 2024-09-23 04:38:00,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178938.66666666666, ans=0.1 2024-09-23 04:38:08,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=178985.33333333334, ans=0.2 2024-09-23 04:38:25,404 INFO [train.py:1198] (3/4) Epoch 10, batch 3300, loss[loss=0.2399, ctc_loss=0.164, cr_loss=0.3795, over 16951.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1736, cr_loss=0.3815, over 3355288.20 frames. ], batch size: 42, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:38:28,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179032.0, ans=0.1 2024-09-23 04:39:02,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=179125.33333333334, ans=0.2 2024-09-23 04:39:12,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=179172.0, ans=0.125 2024-09-23 04:39:19,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179172.0, ans=0.1 2024-09-23 04:39:23,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=179172.0, ans=0.125 2024-09-23 04:39:41,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=179265.33333333334, ans=0.0 2024-09-23 04:39:43,188 INFO [train.py:1198] (3/4) Epoch 10, batch 3350, loss[loss=0.2985, ctc_loss=0.2145, cr_loss=0.4198, over 15172.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1739, cr_loss=0.3818, over 3347331.38 frames. ], batch size: 89, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:40:10,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=179312.0, ans=0.125 2024-09-23 04:40:15,061 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.385e+02 1.553e+02 1.795e+02 2.936e+02, threshold=3.106e+02, percent-clipped=1.0 2024-09-23 04:40:17,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=179358.66666666666, ans=0.0 2024-09-23 04:40:27,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=179358.66666666666, ans=0.2 2024-09-23 04:40:32,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-23 04:40:47,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=179452.0, ans=0.0 2024-09-23 04:41:03,097 INFO [train.py:1198] (3/4) Epoch 10, batch 3400, loss[loss=0.2558, ctc_loss=0.1758, cr_loss=0.4001, over 17330.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1749, cr_loss=0.3837, over 3337471.66 frames. ], batch size: 51, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:41:11,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=179498.66666666666, ans=0.0 2024-09-23 04:41:33,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.46 vs. limit=12.0 2024-09-23 04:42:12,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=179685.33333333334, ans=0.0 2024-09-23 04:42:21,371 INFO [train.py:1198] (3/4) Epoch 10, batch 3450, loss[loss=0.2337, ctc_loss=0.1618, cr_loss=0.3599, over 17026.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1745, cr_loss=0.3834, over 3337532.89 frames. ], batch size: 39, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:42:25,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2024-09-23 04:42:50,604 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.316e+02 1.451e+02 1.743e+02 2.784e+02, threshold=2.902e+02, percent-clipped=0.0 2024-09-23 04:43:08,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=179872.0, ans=0.0 2024-09-23 04:43:08,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.04 vs. limit=15.0 2024-09-23 04:43:29,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=179918.66666666666, ans=0.125 2024-09-23 04:43:40,113 INFO [train.py:1198] (3/4) Epoch 10, batch 3500, loss[loss=0.1912, ctc_loss=0.1309, cr_loss=0.3015, over 17109.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1741, cr_loss=0.3833, over 3336754.85 frames. ], batch size: 40, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:43:48,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=179965.33333333334, ans=0.125 2024-09-23 04:43:59,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=180012.0, ans=0.035 2024-09-23 04:44:02,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180012.0, ans=0.1 2024-09-23 04:44:14,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=180058.66666666666, ans=0.0 2024-09-23 04:44:19,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=180058.66666666666, ans=0.125 2024-09-23 04:44:32,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=180105.33333333334, ans=0.04949747468305833 2024-09-23 04:44:37,525 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.27 vs. limit=15.0 2024-09-23 04:44:55,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=180152.0, ans=0.1 2024-09-23 04:45:02,138 INFO [train.py:1198] (3/4) Epoch 10, batch 3550, loss[loss=0.2287, ctc_loss=0.1607, cr_loss=0.3398, over 16942.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1735, cr_loss=0.3826, over 3351660.51 frames. ], batch size: 42, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:45:32,194 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.310e+02 1.457e+02 1.783e+02 3.691e+02, threshold=2.913e+02, percent-clipped=1.0 2024-09-23 04:46:13,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=180385.33333333334, ans=0.0 2024-09-23 04:46:16,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=180385.33333333334, ans=0.125 2024-09-23 04:46:20,833 INFO [train.py:1198] (3/4) Epoch 10, batch 3600, loss[loss=0.2359, ctc_loss=0.1599, cr_loss=0.3801, over 17057.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1722, cr_loss=0.3811, over 3355280.45 frames. ], batch size: 39, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:46:42,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180478.66666666666, ans=0.1 2024-09-23 04:46:49,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=180478.66666666666, ans=0.2 2024-09-23 04:46:52,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=180525.33333333334, ans=0.0 2024-09-23 04:46:52,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=180525.33333333334, ans=0.0 2024-09-23 04:47:23,935 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=22.5 2024-09-23 04:47:26,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=180618.66666666666, ans=0.0 2024-09-23 04:47:27,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.08 vs. limit=22.5 2024-09-23 04:47:38,887 INFO [train.py:1198] (3/4) Epoch 10, batch 3650, loss[loss=0.2525, ctc_loss=0.1731, cr_loss=0.3968, over 17146.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1725, cr_loss=0.3807, over 3342677.98 frames. ], batch size: 48, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:47:49,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=180665.33333333334, ans=0.1 2024-09-23 04:48:03,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=180712.0, ans=0.125 2024-09-23 04:48:08,242 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.307e+02 1.399e+02 1.525e+02 2.249e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-23 04:48:14,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.12 vs. limit=15.0 2024-09-23 04:48:20,375 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 04:48:21,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180758.66666666666, ans=0.1 2024-09-23 04:48:57,315 INFO [train.py:1198] (3/4) Epoch 10, batch 3700, loss[loss=0.2473, ctc_loss=0.1699, cr_loss=0.387, over 17037.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1714, cr_loss=0.3795, over 3346252.33 frames. ], batch size: 44, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:49:07,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=180898.66666666666, ans=0.125 2024-09-23 04:49:10,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=180898.66666666666, ans=0.0 2024-09-23 04:49:58,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=181038.66666666666, ans=0.2 2024-09-23 04:49:59,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=181085.33333333334, ans=0.125 2024-09-23 04:50:16,593 INFO [train.py:1198] (3/4) Epoch 10, batch 3750, loss[loss=0.2085, ctc_loss=0.1438, cr_loss=0.3238, over 17120.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1715, cr_loss=0.3795, over 3347010.28 frames. ], batch size: 40, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:50:46,085 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.196e+02 1.346e+02 1.480e+02 1.709e+02 3.123e+02, threshold=2.960e+02, percent-clipped=1.0 2024-09-23 04:50:46,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=181225.33333333334, ans=0.2 2024-09-23 04:51:05,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=181272.0, ans=0.04949747468305833 2024-09-23 04:51:34,560 INFO [train.py:1198] (3/4) Epoch 10, batch 3800, loss[loss=0.2172, ctc_loss=0.1487, cr_loss=0.3426, over 16962.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1722, cr_loss=0.379, over 3303075.54 frames. ], batch size: 42, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:52:32,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181505.33333333334, ans=0.1 2024-09-23 04:52:40,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2024-09-23 04:52:45,558 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2024-09-23 04:52:51,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=181598.66666666666, ans=0.0 2024-09-23 04:52:53,256 INFO [train.py:1198] (3/4) Epoch 10, batch 3850, loss[loss=0.2436, ctc_loss=0.1689, cr_loss=0.3736, over 17196.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1727, cr_loss=0.3795, over 3291816.72 frames. ], batch size: 41, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:53:09,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2024-09-23 04:53:19,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.70 vs. limit=15.0 2024-09-23 04:53:22,408 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.346e+02 1.514e+02 1.745e+02 2.888e+02, threshold=3.027e+02, percent-clipped=0.0 2024-09-23 04:53:22,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=181692.0, ans=0.2 2024-09-23 04:53:29,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2024-09-23 04:53:47,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=181738.66666666666, ans=0.125 2024-09-23 04:54:55,686 INFO [train.py:1198] (3/4) Epoch 11, batch 0, loss[loss=0.266, ctc_loss=0.1825, cr_loss=0.4174, over 17146.00 frames. ], tot_loss[loss=0.266, ctc_loss=0.1825, cr_loss=0.4174, over 17146.00 frames. ], batch size: 48, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 04:54:55,686 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 04:55:11,307 INFO [train.py:1230] (3/4) Epoch 11, validation: loss=0.04963, ctc_loss=0.04963, cr_loss=7.372e-15, over 944034.00 frames. 2024-09-23 04:55:11,308 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 04:55:39,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=181860.0, ans=0.125 2024-09-23 04:55:41,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181860.0, ans=0.1 2024-09-23 04:55:43,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=181860.0, ans=0.125 2024-09-23 04:55:59,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181906.66666666666, ans=0.1 2024-09-23 04:56:07,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=181953.33333333334, ans=0.125 2024-09-23 04:56:26,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=182000.0, ans=0.125 2024-09-23 04:56:34,330 INFO [train.py:1198] (3/4) Epoch 11, batch 50, loss[loss=0.2385, ctc_loss=0.1613, cr_loss=0.3859, over 17307.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1738, cr_loss=0.3827, over 751875.84 frames. ], batch size: 49, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 04:56:47,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=182046.66666666666, ans=0.04949747468305833 2024-09-23 04:56:47,932 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2024-09-23 04:57:12,614 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.435e+02 1.595e+02 1.830e+02 2.692e+02, threshold=3.191e+02, percent-clipped=0.0 2024-09-23 04:57:49,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=182233.33333333334, ans=0.05 2024-09-23 04:57:52,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=182280.0, ans=0.2 2024-09-23 04:57:53,905 INFO [train.py:1198] (3/4) Epoch 11, batch 100, loss[loss=0.2025, ctc_loss=0.1376, cr_loss=0.3244, over 17283.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1702, cr_loss=0.3793, over 1335866.57 frames. ], batch size: 42, lr: 1.12e-02, grad_scale: 16.0 2024-09-23 04:58:16,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=182326.66666666666, ans=0.0 2024-09-23 04:58:35,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=182373.33333333334, ans=0.125 2024-09-23 04:59:11,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=182466.66666666666, ans=0.2 2024-09-23 04:59:16,235 INFO [train.py:1198] (3/4) Epoch 11, batch 150, loss[loss=0.2345, ctc_loss=0.1628, cr_loss=0.3587, over 17127.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1698, cr_loss=0.3775, over 1780600.63 frames. ], batch size: 40, lr: 1.12e-02, grad_scale: 16.0 2024-09-23 04:59:35,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=182560.0, ans=0.125 2024-09-23 04:59:40,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=182560.0, ans=0.1 2024-09-23 04:59:57,641 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.353e+02 1.511e+02 1.758e+02 2.701e+02, threshold=3.021e+02, percent-clipped=0.0 2024-09-23 05:00:41,719 INFO [train.py:1198] (3/4) Epoch 11, batch 200, loss[loss=0.2727, ctc_loss=0.194, cr_loss=0.3937, over 15308.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1717, cr_loss=0.3792, over 2119322.89 frames. ], batch size: 90, lr: 1.12e-02, grad_scale: 16.0 2024-09-23 05:01:05,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=182793.33333333334, ans=0.0 2024-09-23 05:01:21,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=182840.0, ans=0.025 2024-09-23 05:01:24,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=182840.0, ans=0.0 2024-09-23 05:01:28,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=182886.66666666666, ans=0.125 2024-09-23 05:01:33,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=182886.66666666666, ans=0.125 2024-09-23 05:01:37,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=182886.66666666666, ans=0.125 2024-09-23 05:02:01,468 INFO [train.py:1198] (3/4) Epoch 11, batch 250, loss[loss=0.2672, ctc_loss=0.1874, cr_loss=0.399, over 17310.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1724, cr_loss=0.3815, over 2392007.96 frames. ], batch size: 51, lr: 1.12e-02, grad_scale: 16.0 2024-09-23 05:02:03,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=182980.0, ans=0.0 2024-09-23 05:02:03,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=182980.0, ans=0.09899494936611666 2024-09-23 05:02:06,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=182980.0, ans=0.1 2024-09-23 05:02:39,688 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.357e+02 1.521e+02 1.743e+02 2.579e+02, threshold=3.043e+02, percent-clipped=0.0 2024-09-23 05:03:20,848 INFO [train.py:1198] (3/4) Epoch 11, batch 300, loss[loss=0.1972, ctc_loss=0.1346, cr_loss=0.3132, over 17091.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1708, cr_loss=0.3793, over 2605870.07 frames. ], batch size: 40, lr: 1.12e-02, grad_scale: 16.0 2024-09-23 05:03:24,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2024-09-23 05:03:29,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=183213.33333333334, ans=0.0 2024-09-23 05:03:40,565 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2024-09-23 05:03:48,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=183260.0, ans=0.025 2024-09-23 05:04:04,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=183306.66666666666, ans=0.125 2024-09-23 05:04:12,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=183353.33333333334, ans=0.125 2024-09-23 05:04:23,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=183353.33333333334, ans=0.125 2024-09-23 05:04:24,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=183353.33333333334, ans=0.0 2024-09-23 05:04:33,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=183400.0, ans=0.0 2024-09-23 05:04:35,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=183400.0, ans=0.125 2024-09-23 05:04:47,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=183446.66666666666, ans=0.0 2024-09-23 05:04:49,185 INFO [train.py:1198] (3/4) Epoch 11, batch 350, loss[loss=0.2487, ctc_loss=0.1743, cr_loss=0.3723, over 17010.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.171, cr_loss=0.38, over 2770932.93 frames. ], batch size: 44, lr: 1.12e-02, grad_scale: 16.0 2024-09-23 05:05:01,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.41 vs. limit=15.0 2024-09-23 05:05:04,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-23 05:05:17,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=183493.33333333334, ans=0.0 2024-09-23 05:05:30,066 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.290e+02 1.386e+02 1.605e+02 2.385e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-23 05:06:00,747 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.03 vs. limit=15.0 2024-09-23 05:06:03,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=183633.33333333334, ans=0.025 2024-09-23 05:06:11,273 INFO [train.py:1198] (3/4) Epoch 11, batch 400, loss[loss=0.2602, ctc_loss=0.1821, cr_loss=0.3907, over 16937.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1713, cr_loss=0.3805, over 2894384.33 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 05:07:02,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=183820.0, ans=0.1 2024-09-23 05:07:31,445 INFO [train.py:1198] (3/4) Epoch 11, batch 450, loss[loss=0.2393, ctc_loss=0.1648, cr_loss=0.3724, over 17242.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.172, cr_loss=0.3812, over 2991051.31 frames. ], batch size: 44, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 05:07:48,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=22.5 2024-09-23 05:07:55,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=183960.0, ans=0.125 2024-09-23 05:08:09,519 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.312e+02 1.426e+02 1.601e+02 2.161e+02, threshold=2.852e+02, percent-clipped=0.0 2024-09-23 05:08:24,188 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:08:44,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=184100.0, ans=0.125 2024-09-23 05:08:53,366 INFO [train.py:1198] (3/4) Epoch 11, batch 500, loss[loss=0.2873, ctc_loss=0.2012, cr_loss=0.4303, over 16902.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.173, cr_loss=0.3821, over 3077433.90 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 05:08:58,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=184146.66666666666, ans=0.1 2024-09-23 05:09:19,988 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.84 vs. limit=15.0 2024-09-23 05:10:11,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=184333.33333333334, ans=0.125 2024-09-23 05:10:22,053 INFO [train.py:1198] (3/4) Epoch 11, batch 550, loss[loss=0.2919, ctc_loss=0.2018, cr_loss=0.4508, over 16499.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1734, cr_loss=0.3825, over 3131733.01 frames. ], batch size: 66, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 05:10:35,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=12.0 2024-09-23 05:10:36,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=184426.66666666666, ans=0.2 2024-09-23 05:11:00,283 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.305e+02 1.444e+02 1.636e+02 3.823e+02, threshold=2.888e+02, percent-clipped=1.0 2024-09-23 05:11:03,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=184473.33333333334, ans=0.07 2024-09-23 05:11:19,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=184520.0, ans=0.125 2024-09-23 05:11:41,806 INFO [train.py:1198] (3/4) Epoch 11, batch 600, loss[loss=0.2355, ctc_loss=0.1578, cr_loss=0.3881, over 17354.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1728, cr_loss=0.3818, over 3180494.15 frames. ], batch size: 48, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 05:11:45,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=184613.33333333334, ans=0.0 2024-09-23 05:11:57,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=184660.0, ans=0.125 2024-09-23 05:12:42,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2024-09-23 05:12:51,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-09-23 05:13:01,770 INFO [train.py:1198] (3/4) Epoch 11, batch 650, loss[loss=0.2413, ctc_loss=0.1667, cr_loss=0.3726, over 17030.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1709, cr_loss=0.3791, over 3226097.00 frames. ], batch size: 52, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 05:13:06,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=184846.66666666666, ans=0.0 2024-09-23 05:13:10,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=184846.66666666666, ans=0.0 2024-09-23 05:13:19,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=184893.33333333334, ans=0.125 2024-09-23 05:13:43,068 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.331e+02 1.445e+02 1.618e+02 2.572e+02, threshold=2.890e+02, percent-clipped=0.0 2024-09-23 05:13:43,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=184940.0, ans=0.1 2024-09-23 05:13:45,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=184940.0, ans=0.125 2024-09-23 05:14:27,583 INFO [train.py:1198] (3/4) Epoch 11, batch 700, loss[loss=0.2044, ctc_loss=0.139, cr_loss=0.3271, over 17187.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1715, cr_loss=0.3795, over 3253404.07 frames. ], batch size: 41, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:14:30,119 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.85 vs. limit=22.5 2024-09-23 05:14:32,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=185080.0, ans=0.125 2024-09-23 05:15:38,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=185266.66666666666, ans=0.025 2024-09-23 05:15:39,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=185266.66666666666, ans=0.0 2024-09-23 05:15:44,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=185266.66666666666, ans=0.125 2024-09-23 05:15:52,063 INFO [train.py:1198] (3/4) Epoch 11, batch 750, loss[loss=0.2551, ctc_loss=0.1765, cr_loss=0.3931, over 17150.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1707, cr_loss=0.3782, over 3283488.18 frames. ], batch size: 48, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:15:53,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2024-09-23 05:16:06,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=185360.0, ans=0.125 2024-09-23 05:16:16,604 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=12.0 2024-09-23 05:16:17,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=185360.0, ans=0.0 2024-09-23 05:16:30,316 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.321e+02 1.443e+02 1.699e+02 2.904e+02, threshold=2.886e+02, percent-clipped=1.0 2024-09-23 05:16:57,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=185500.0, ans=0.125 2024-09-23 05:16:59,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=185500.0, ans=0.07 2024-09-23 05:17:05,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=185500.0, ans=0.0 2024-09-23 05:17:10,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=185546.66666666666, ans=0.09899494936611666 2024-09-23 05:17:11,522 INFO [train.py:1198] (3/4) Epoch 11, batch 800, loss[loss=0.2438, ctc_loss=0.1679, cr_loss=0.38, over 17224.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1684, cr_loss=0.3756, over 3304581.53 frames. ], batch size: 50, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:17:32,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-09-23 05:17:43,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=185640.0, ans=0.0 2024-09-23 05:17:43,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=185640.0, ans=0.2 2024-09-23 05:18:08,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.50 vs. limit=22.5 2024-09-23 05:18:19,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=22.5 2024-09-23 05:18:31,205 INFO [train.py:1198] (3/4) Epoch 11, batch 850, loss[loss=0.2415, ctc_loss=0.165, cr_loss=0.3823, over 17065.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.169, cr_loss=0.3769, over 3313899.13 frames. ], batch size: 46, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:19:04,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=185873.33333333334, ans=0.0 2024-09-23 05:19:12,273 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.355e+02 1.504e+02 1.794e+02 2.469e+02, threshold=3.008e+02, percent-clipped=0.0 2024-09-23 05:19:38,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2024-09-23 05:19:53,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=185966.66666666666, ans=0.2 2024-09-23 05:19:59,416 INFO [train.py:1198] (3/4) Epoch 11, batch 900, loss[loss=0.279, ctc_loss=0.1915, cr_loss=0.4373, over 17140.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.17, cr_loss=0.3788, over 3320031.05 frames. ], batch size: 48, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:20:05,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=186013.33333333334, ans=0.0 2024-09-23 05:20:13,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=186013.33333333334, ans=0.125 2024-09-23 05:20:38,361 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2024-09-23 05:20:42,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=186106.66666666666, ans=0.125 2024-09-23 05:20:53,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=186153.33333333334, ans=0.125 2024-09-23 05:21:04,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=186200.0, ans=0.025 2024-09-23 05:21:22,045 INFO [train.py:1198] (3/4) Epoch 11, batch 950, loss[loss=0.2721, ctc_loss=0.1914, cr_loss=0.4036, over 17013.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1709, cr_loss=0.3807, over 3335890.55 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:21:23,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=186246.66666666666, ans=0.04949747468305833 2024-09-23 05:21:24,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=186246.66666666666, ans=0.125 2024-09-23 05:21:34,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2024-09-23 05:21:48,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=186293.33333333334, ans=0.1 2024-09-23 05:21:48,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=186293.33333333334, ans=0.1 2024-09-23 05:21:48,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=186293.33333333334, ans=0.125 2024-09-23 05:22:00,860 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.261e+02 1.404e+02 1.572e+02 2.138e+02, threshold=2.809e+02, percent-clipped=0.0 2024-09-23 05:22:42,322 INFO [train.py:1198] (3/4) Epoch 11, batch 1000, loss[loss=0.2189, ctc_loss=0.1469, cr_loss=0.3596, over 17279.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1711, cr_loss=0.3813, over 3342164.33 frames. ], batch size: 42, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:22:42,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=22.5 2024-09-23 05:22:54,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.05 vs. limit=10.0 2024-09-23 05:23:12,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=186573.33333333334, ans=0.125 2024-09-23 05:23:32,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=186620.0, ans=0.0 2024-09-23 05:23:59,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=186666.66666666666, ans=0.125 2024-09-23 05:24:06,722 INFO [train.py:1198] (3/4) Epoch 11, batch 1050, loss[loss=0.219, ctc_loss=0.1466, cr_loss=0.3617, over 16947.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.171, cr_loss=0.3804, over 3347073.09 frames. ], batch size: 42, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:24:15,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2024-09-23 05:24:30,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=186760.0, ans=22.5 2024-09-23 05:24:50,587 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.363e+02 1.550e+02 2.011e+02 3.304e+02, threshold=3.099e+02, percent-clipped=2.0 2024-09-23 05:25:04,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=186853.33333333334, ans=0.1 2024-09-23 05:25:15,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=186853.33333333334, ans=0.0 2024-09-23 05:25:28,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=186900.0, ans=0.1 2024-09-23 05:25:34,181 INFO [train.py:1198] (3/4) Epoch 11, batch 1100, loss[loss=0.2328, ctc_loss=0.1581, cr_loss=0.3733, over 17214.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1707, cr_loss=0.3805, over 3351019.99 frames. ], batch size: 47, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:25:42,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=186946.66666666666, ans=0.125 2024-09-23 05:25:46,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=186946.66666666666, ans=0.2 2024-09-23 05:26:22,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=187086.66666666666, ans=0.125 2024-09-23 05:26:25,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=187086.66666666666, ans=0.0 2024-09-23 05:26:26,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=187086.66666666666, ans=0.125 2024-09-23 05:26:30,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=187086.66666666666, ans=0.0 2024-09-23 05:26:47,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=187133.33333333334, ans=0.125 2024-09-23 05:26:53,985 INFO [train.py:1198] (3/4) Epoch 11, batch 1150, loss[loss=0.2533, ctc_loss=0.1743, cr_loss=0.395, over 16907.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.171, cr_loss=0.3811, over 3358168.14 frames. ], batch size: 58, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:26:58,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=187180.0, ans=0.125 2024-09-23 05:27:26,049 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:27:32,067 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.352e+02 1.474e+02 1.715e+02 2.168e+02, threshold=2.948e+02, percent-clipped=0.0 2024-09-23 05:27:35,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=187273.33333333334, ans=10.0 2024-09-23 05:27:59,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=187366.66666666666, ans=0.0 2024-09-23 05:28:04,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187366.66666666666, ans=0.1 2024-09-23 05:28:13,188 INFO [train.py:1198] (3/4) Epoch 11, batch 1200, loss[loss=0.2363, ctc_loss=0.1609, cr_loss=0.3769, over 17020.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1705, cr_loss=0.38, over 3356450.46 frames. ], batch size: 51, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:28:18,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=187413.33333333334, ans=0.0 2024-09-23 05:28:32,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=187460.0, ans=0.125 2024-09-23 05:28:54,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=187506.66666666666, ans=0.1 2024-09-23 05:28:59,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2024-09-23 05:29:28,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=187600.0, ans=0.2 2024-09-23 05:29:38,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2024-09-23 05:29:41,428 INFO [train.py:1198] (3/4) Epoch 11, batch 1250, loss[loss=0.2558, ctc_loss=0.1748, cr_loss=0.405, over 17033.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1709, cr_loss=0.3812, over 3352295.19 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:30:13,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=187693.33333333334, ans=0.0 2024-09-23 05:30:22,692 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.347e+02 1.488e+02 1.647e+02 3.078e+02, threshold=2.976e+02, percent-clipped=1.0 2024-09-23 05:30:56,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=187833.33333333334, ans=0.125 2024-09-23 05:31:01,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=187833.33333333334, ans=0.2 2024-09-23 05:31:04,207 INFO [train.py:1198] (3/4) Epoch 11, batch 1300, loss[loss=0.2586, ctc_loss=0.1788, cr_loss=0.3992, over 17062.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1716, cr_loss=0.3818, over 3347544.78 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:31:44,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=187973.33333333334, ans=0.125 2024-09-23 05:31:49,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-09-23 05:31:55,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=188020.0, ans=0.0 2024-09-23 05:32:05,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-09-23 05:32:23,728 INFO [train.py:1198] (3/4) Epoch 11, batch 1350, loss[loss=0.2553, ctc_loss=0.1743, cr_loss=0.4054, over 17033.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1708, cr_loss=0.3809, over 3360489.07 frames. ], batch size: 52, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:32:52,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=188160.0, ans=0.125 2024-09-23 05:33:02,037 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.248e+02 1.402e+02 1.590e+02 2.779e+02, threshold=2.805e+02, percent-clipped=0.0 2024-09-23 05:33:08,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=188206.66666666666, ans=0.95 2024-09-23 05:33:23,945 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.30 vs. limit=10.0 2024-09-23 05:33:24,118 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.91 vs. limit=10.0 2024-09-23 05:33:45,864 INFO [train.py:1198] (3/4) Epoch 11, batch 1400, loss[loss=0.2545, ctc_loss=0.1782, cr_loss=0.3815, over 16684.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1695, cr_loss=0.3794, over 3370498.87 frames. ], batch size: 61, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:34:44,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=188486.66666666666, ans=0.025 2024-09-23 05:35:13,558 INFO [train.py:1198] (3/4) Epoch 11, batch 1450, loss[loss=0.2565, ctc_loss=0.1782, cr_loss=0.3913, over 16938.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1693, cr_loss=0.378, over 3359891.72 frames. ], batch size: 58, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:35:31,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=188626.66666666666, ans=0.125 2024-09-23 05:35:34,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=188626.66666666666, ans=0.2 2024-09-23 05:35:42,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=188626.66666666666, ans=0.0 2024-09-23 05:35:51,363 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.355e+02 1.506e+02 1.702e+02 2.506e+02, threshold=3.011e+02, percent-clipped=0.0 2024-09-23 05:35:57,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=188673.33333333334, ans=0.5 2024-09-23 05:36:28,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=188766.66666666666, ans=0.0 2024-09-23 05:36:32,873 INFO [train.py:1198] (3/4) Epoch 11, batch 1500, loss[loss=0.3012, ctc_loss=0.2223, cr_loss=0.3945, over 11516.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1704, cr_loss=0.3798, over 3356420.85 frames. ], batch size: 123, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:36:41,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=188813.33333333334, ans=0.1 2024-09-23 05:36:49,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=188860.0, ans=0.0 2024-09-23 05:37:20,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.15 vs. limit=15.0 2024-09-23 05:37:24,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=188953.33333333334, ans=0.0 2024-09-23 05:37:52,945 INFO [train.py:1198] (3/4) Epoch 11, batch 1550, loss[loss=0.2569, ctc_loss=0.1786, cr_loss=0.3916, over 17033.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1698, cr_loss=0.3795, over 3366269.25 frames. ], batch size: 52, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:37:56,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=189046.66666666666, ans=0.0 2024-09-23 05:38:06,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2024-09-23 05:38:12,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=189093.33333333334, ans=0.125 2024-09-23 05:38:14,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=189093.33333333334, ans=0.125 2024-09-23 05:38:33,047 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.268e+02 1.378e+02 1.550e+02 2.066e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-23 05:38:38,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=189140.0, ans=0.125 2024-09-23 05:38:45,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=189186.66666666666, ans=0.125 2024-09-23 05:39:10,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=189233.33333333334, ans=0.0 2024-09-23 05:39:15,065 INFO [train.py:1198] (3/4) Epoch 11, batch 1600, loss[loss=0.2169, ctc_loss=0.1484, cr_loss=0.3426, over 17287.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1704, cr_loss=0.3799, over 3349669.40 frames. ], batch size: 42, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:39:31,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=189280.0, ans=0.2 2024-09-23 05:40:09,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=189420.0, ans=0.0 2024-09-23 05:40:35,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=189466.66666666666, ans=10.0 2024-09-23 05:40:35,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=189466.66666666666, ans=0.2 2024-09-23 05:40:42,831 INFO [train.py:1198] (3/4) Epoch 11, batch 1650, loss[loss=0.2676, ctc_loss=0.186, cr_loss=0.4082, over 16992.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1701, cr_loss=0.3796, over 3349242.85 frames. ], batch size: 53, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:40:51,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2024-09-23 05:41:13,848 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:41:22,851 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.323e+02 1.468e+02 1.701e+02 2.632e+02, threshold=2.937e+02, percent-clipped=0.0 2024-09-23 05:41:24,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=189606.66666666666, ans=0.125 2024-09-23 05:41:26,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-09-23 05:42:02,597 INFO [train.py:1198] (3/4) Epoch 11, batch 1700, loss[loss=0.2494, ctc_loss=0.1736, cr_loss=0.379, over 17337.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1707, cr_loss=0.3804, over 3348604.23 frames. ], batch size: 48, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:42:15,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=189746.66666666666, ans=0.125 2024-09-23 05:42:18,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=189793.33333333334, ans=0.0 2024-09-23 05:42:45,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=189840.0, ans=0.2 2024-09-23 05:43:00,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=189886.66666666666, ans=0.025 2024-09-23 05:43:12,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.43 vs. limit=10.0 2024-09-23 05:43:19,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=189933.33333333334, ans=0.125 2024-09-23 05:43:22,428 INFO [train.py:1198] (3/4) Epoch 11, batch 1750, loss[loss=0.2178, ctc_loss=0.1482, cr_loss=0.3482, over 17117.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1706, cr_loss=0.3802, over 3357857.37 frames. ], batch size: 40, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:43:22,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=189980.0, ans=0.2 2024-09-23 05:43:38,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2024-09-23 05:43:51,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=190026.66666666666, ans=0.2 2024-09-23 05:43:52,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=190026.66666666666, ans=10.0 2024-09-23 05:44:04,907 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.343e+02 1.475e+02 1.671e+02 2.216e+02, threshold=2.950e+02, percent-clipped=0.0 2024-09-23 05:44:06,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=190073.33333333334, ans=0.125 2024-09-23 05:44:08,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=22.5 2024-09-23 05:44:32,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=190166.66666666666, ans=0.0 2024-09-23 05:44:32,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=190166.66666666666, ans=0.125 2024-09-23 05:44:52,208 INFO [train.py:1198] (3/4) Epoch 11, batch 1800, loss[loss=0.249, ctc_loss=0.1747, cr_loss=0.3716, over 17049.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1711, cr_loss=0.3811, over 3361415.97 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:45:02,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=190213.33333333334, ans=0.125 2024-09-23 05:45:02,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2024-09-23 05:45:06,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=190260.0, ans=0.0 2024-09-23 05:45:13,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2024-09-23 05:45:14,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=190260.0, ans=0.1 2024-09-23 05:45:27,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=190306.66666666666, ans=0.0 2024-09-23 05:45:38,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=190353.33333333334, ans=0.125 2024-09-23 05:45:43,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=190353.33333333334, ans=0.125 2024-09-23 05:45:48,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=190353.33333333334, ans=0.2 2024-09-23 05:46:08,759 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2024-09-23 05:46:12,737 INFO [train.py:1198] (3/4) Epoch 11, batch 1850, loss[loss=0.2723, ctc_loss=0.1902, cr_loss=0.4107, over 17218.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1699, cr_loss=0.3794, over 3370939.99 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:46:46,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=190540.0, ans=0.0 2024-09-23 05:46:50,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.12 vs. limit=5.0 2024-09-23 05:46:52,383 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.336e+02 1.526e+02 1.737e+02 2.806e+02, threshold=3.051e+02, percent-clipped=0.0 2024-09-23 05:47:32,660 INFO [train.py:1198] (3/4) Epoch 11, batch 1900, loss[loss=0.2873, ctc_loss=0.2003, cr_loss=0.4348, over 17053.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1704, cr_loss=0.3802, over 3370528.87 frames. ], batch size: 52, lr: 1.10e-02, grad_scale: 16.0 2024-09-23 05:47:40,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=190680.0, ans=0.07 2024-09-23 05:48:45,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=190866.66666666666, ans=0.2 2024-09-23 05:48:53,683 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:48:55,012 INFO [train.py:1198] (3/4) Epoch 11, batch 1950, loss[loss=0.2281, ctc_loss=0.1563, cr_loss=0.359, over 16961.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1709, cr_loss=0.3812, over 3369436.66 frames. ], batch size: 42, lr: 1.10e-02, grad_scale: 16.0 2024-09-23 05:49:03,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=190913.33333333334, ans=0.125 2024-09-23 05:49:22,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.75 vs. limit=15.0 2024-09-23 05:49:29,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=190960.0, ans=0.1 2024-09-23 05:49:41,890 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.367e+02 1.554e+02 1.740e+02 3.825e+02, threshold=3.108e+02, percent-clipped=1.0 2024-09-23 05:50:22,740 INFO [train.py:1198] (3/4) Epoch 11, batch 2000, loss[loss=0.2239, ctc_loss=0.1493, cr_loss=0.3727, over 17145.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1708, cr_loss=0.3804, over 3368253.57 frames. ], batch size: 45, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:50:37,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=191193.33333333334, ans=0.125 2024-09-23 05:50:47,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=12.0 2024-09-23 05:50:58,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=191240.0, ans=0.125 2024-09-23 05:50:58,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=191240.0, ans=0.025 2024-09-23 05:51:01,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=191240.0, ans=0.015 2024-09-23 05:51:03,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-09-23 05:51:30,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=191333.33333333334, ans=0.0 2024-09-23 05:51:43,329 INFO [train.py:1198] (3/4) Epoch 11, batch 2050, loss[loss=0.2498, ctc_loss=0.1714, cr_loss=0.3924, over 17311.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1698, cr_loss=0.3794, over 3366917.34 frames. ], batch size: 51, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:51:57,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=191426.66666666666, ans=0.125 2024-09-23 05:52:04,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=191426.66666666666, ans=0.125 2024-09-23 05:52:05,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=191426.66666666666, ans=0.125 2024-09-23 05:52:19,195 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=12.0 2024-09-23 05:52:20,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=191473.33333333334, ans=0.0 2024-09-23 05:52:24,543 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.258e+02 1.361e+02 1.544e+02 2.924e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-23 05:52:25,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.56 vs. limit=10.0 2024-09-23 05:53:00,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=191566.66666666666, ans=0.0 2024-09-23 05:53:03,035 INFO [train.py:1198] (3/4) Epoch 11, batch 2100, loss[loss=0.2449, ctc_loss=0.1696, cr_loss=0.3765, over 17194.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.17, cr_loss=0.3801, over 3369267.41 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:53:29,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=12.0 2024-09-23 05:53:30,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=191660.0, ans=0.125 2024-09-23 05:54:30,862 INFO [train.py:1198] (3/4) Epoch 11, batch 2150, loss[loss=0.2484, ctc_loss=0.1722, cr_loss=0.3808, over 17155.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1701, cr_loss=0.3807, over 3365413.61 frames. ], batch size: 48, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:54:33,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.09 vs. limit=15.0 2024-09-23 05:54:54,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=12.0 2024-09-23 05:54:59,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=191893.33333333334, ans=0.0 2024-09-23 05:55:14,826 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.341e+02 1.514e+02 1.705e+02 2.483e+02, threshold=3.028e+02, percent-clipped=0.0 2024-09-23 05:55:42,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=192033.33333333334, ans=0.5 2024-09-23 05:55:48,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=192033.33333333334, ans=0.025 2024-09-23 05:55:53,282 INFO [train.py:1198] (3/4) Epoch 11, batch 2200, loss[loss=0.2598, ctc_loss=0.1822, cr_loss=0.388, over 17008.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1705, cr_loss=0.3815, over 3372231.38 frames. ], batch size: 51, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:56:09,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=192126.66666666666, ans=0.0 2024-09-23 05:56:14,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=192126.66666666666, ans=0.0 2024-09-23 05:56:16,041 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:56:37,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=192173.33333333334, ans=0.1 2024-09-23 05:56:59,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=192266.66666666666, ans=0.025 2024-09-23 05:57:01,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=192266.66666666666, ans=0.0 2024-09-23 05:57:10,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=192266.66666666666, ans=0.025 2024-09-23 05:57:13,775 INFO [train.py:1198] (3/4) Epoch 11, batch 2250, loss[loss=0.3032, ctc_loss=0.2201, cr_loss=0.4152, over 11588.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.17, cr_loss=0.3806, over 3358655.39 frames. ], batch size: 123, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 05:57:23,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=192313.33333333334, ans=0.0 2024-09-23 05:57:49,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192406.66666666666, ans=0.1 2024-09-23 05:57:55,417 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.304e+02 1.378e+02 1.493e+02 3.334e+02, threshold=2.757e+02, percent-clipped=1.0 2024-09-23 05:58:19,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=192500.0, ans=0.0 2024-09-23 05:58:27,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=192500.0, ans=0.125 2024-09-23 05:58:36,220 INFO [train.py:1198] (3/4) Epoch 11, batch 2300, loss[loss=0.2423, ctc_loss=0.169, cr_loss=0.3665, over 17300.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1705, cr_loss=0.3813, over 3357091.26 frames. ], batch size: 49, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 05:59:15,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=192640.0, ans=0.125 2024-09-23 05:59:37,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=192686.66666666666, ans=0.0 2024-09-23 05:59:44,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=192686.66666666666, ans=0.125 2024-09-23 05:59:53,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.33 vs. limit=15.0 2024-09-23 06:00:00,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=192733.33333333334, ans=0.125 2024-09-23 06:00:03,695 INFO [train.py:1198] (3/4) Epoch 11, batch 2350, loss[loss=0.2577, ctc_loss=0.1797, cr_loss=0.3902, over 16638.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1699, cr_loss=0.3804, over 3353368.76 frames. ], batch size: 66, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:00:35,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=192873.33333333334, ans=0.0 2024-09-23 06:00:38,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=192873.33333333334, ans=0.0 2024-09-23 06:00:44,565 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.266e+02 1.351e+02 1.522e+02 2.296e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-23 06:00:50,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.25 vs. limit=15.0 2024-09-23 06:00:53,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=192920.0, ans=0.0 2024-09-23 06:01:02,574 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:01:04,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=192920.0, ans=0.2 2024-09-23 06:01:07,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=192966.66666666666, ans=0.125 2024-09-23 06:01:23,298 INFO [train.py:1198] (3/4) Epoch 11, batch 2400, loss[loss=0.3012, ctc_loss=0.2131, cr_loss=0.4407, over 14776.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.171, cr_loss=0.382, over 3356505.82 frames. ], batch size: 89, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:01:41,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=193060.0, ans=0.125 2024-09-23 06:01:51,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=193060.0, ans=0.125 2024-09-23 06:01:52,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=193060.0, ans=0.0 2024-09-23 06:02:03,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=193106.66666666666, ans=0.125 2024-09-23 06:02:03,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=193106.66666666666, ans=0.125 2024-09-23 06:02:04,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=15.0 2024-09-23 06:02:43,185 INFO [train.py:1198] (3/4) Epoch 11, batch 2450, loss[loss=0.2075, ctc_loss=0.1417, cr_loss=0.3293, over 17044.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1703, cr_loss=0.3813, over 3362670.68 frames. ], batch size: 39, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:03:07,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.64 vs. limit=22.5 2024-09-23 06:03:18,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=193340.0, ans=0.0 2024-09-23 06:03:20,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=193340.0, ans=0.125 2024-09-23 06:03:24,779 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.301e+02 1.390e+02 1.544e+02 1.973e+02, threshold=2.781e+02, percent-clipped=0.0 2024-09-23 06:03:26,754 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:04:07,698 INFO [train.py:1198] (3/4) Epoch 11, batch 2500, loss[loss=0.2885, ctc_loss=0.1977, cr_loss=0.4538, over 17221.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1702, cr_loss=0.3822, over 3369686.65 frames. ], batch size: 47, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:04:38,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=193526.66666666666, ans=0.1 2024-09-23 06:04:57,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=15.0 2024-09-23 06:05:02,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=193620.0, ans=0.1 2024-09-23 06:05:03,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=12.0 2024-09-23 06:05:20,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=193666.66666666666, ans=0.125 2024-09-23 06:05:24,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=193666.66666666666, ans=0.2 2024-09-23 06:05:26,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=193666.66666666666, ans=0.125 2024-09-23 06:05:32,499 INFO [train.py:1198] (3/4) Epoch 11, batch 2550, loss[loss=0.2561, ctc_loss=0.1773, cr_loss=0.3943, over 17302.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1695, cr_loss=0.3821, over 3377875.02 frames. ], batch size: 49, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:06:13,861 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.308e+02 1.472e+02 1.682e+02 2.409e+02, threshold=2.944e+02, percent-clipped=0.0 2024-09-23 06:06:14,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2024-09-23 06:06:36,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2024-09-23 06:06:42,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=193900.0, ans=0.125 2024-09-23 06:06:51,768 INFO [train.py:1198] (3/4) Epoch 11, batch 2600, loss[loss=0.2358, ctc_loss=0.1583, cr_loss=0.3877, over 17229.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1694, cr_loss=0.3816, over 3386139.37 frames. ], batch size: 47, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:06:53,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=193946.66666666666, ans=0.0 2024-09-23 06:06:58,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.65 vs. limit=22.5 2024-09-23 06:07:07,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=193993.33333333334, ans=0.125 2024-09-23 06:07:12,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=193993.33333333334, ans=0.0 2024-09-23 06:07:15,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=193993.33333333334, ans=0.0 2024-09-23 06:07:32,440 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.31 vs. limit=15.0 2024-09-23 06:08:11,672 INFO [train.py:1198] (3/4) Epoch 11, batch 2650, loss[loss=0.26, ctc_loss=0.1826, cr_loss=0.3873, over 17314.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1703, cr_loss=0.382, over 3367702.38 frames. ], batch size: 51, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:08:12,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=194180.0, ans=0.125 2024-09-23 06:08:49,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=194273.33333333334, ans=0.0 2024-09-23 06:08:57,084 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.341e+02 1.529e+02 1.824e+02 3.038e+02, threshold=3.058e+02, percent-clipped=1.0 2024-09-23 06:08:59,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=194273.33333333334, ans=0.0 2024-09-23 06:09:11,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=194320.0, ans=0.125 2024-09-23 06:09:13,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=194320.0, ans=0.125 2024-09-23 06:09:23,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.19 vs. limit=10.0 2024-09-23 06:09:29,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=194366.66666666666, ans=0.1 2024-09-23 06:09:40,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=194413.33333333334, ans=0.125 2024-09-23 06:09:41,684 INFO [train.py:1198] (3/4) Epoch 11, batch 2700, loss[loss=0.2511, ctc_loss=0.1741, cr_loss=0.3848, over 16927.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1698, cr_loss=0.3804, over 3367773.26 frames. ], batch size: 58, lr: 1.09e-02, grad_scale: 16.0 2024-09-23 06:10:01,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=194460.0, ans=0.125 2024-09-23 06:10:01,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=194460.0, ans=0.125 2024-09-23 06:10:14,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2024-09-23 06:10:37,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=194553.33333333334, ans=0.0 2024-09-23 06:10:49,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=194600.0, ans=0.1 2024-09-23 06:10:58,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=194600.0, ans=0.1 2024-09-23 06:11:01,391 INFO [train.py:1198] (3/4) Epoch 11, batch 2750, loss[loss=0.2314, ctc_loss=0.1571, cr_loss=0.3713, over 16960.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1708, cr_loss=0.3821, over 3367058.86 frames. ], batch size: 42, lr: 1.09e-02, grad_scale: 16.0 2024-09-23 06:11:12,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=194646.66666666666, ans=0.1 2024-09-23 06:11:35,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=15.0 2024-09-23 06:11:44,297 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.275e+02 1.375e+02 1.599e+02 2.337e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-23 06:11:55,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=194786.66666666666, ans=0.125 2024-09-23 06:12:21,172 INFO [train.py:1198] (3/4) Epoch 11, batch 2800, loss[loss=0.1993, ctc_loss=0.1363, cr_loss=0.3147, over 16322.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1703, cr_loss=0.3804, over 3354613.76 frames. ], batch size: 36, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:12:42,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=194926.66666666666, ans=0.125 2024-09-23 06:12:54,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=194973.33333333334, ans=0.125 2024-09-23 06:13:32,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=195066.66666666666, ans=0.0 2024-09-23 06:13:38,982 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:13:44,869 INFO [train.py:1198] (3/4) Epoch 11, batch 2850, loss[loss=0.2422, ctc_loss=0.165, cr_loss=0.386, over 17144.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1697, cr_loss=0.3792, over 3352461.70 frames. ], batch size: 48, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:14:29,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=195206.66666666666, ans=0.0 2024-09-23 06:14:35,500 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.307e+02 1.410e+02 1.605e+02 2.111e+02, threshold=2.819e+02, percent-clipped=0.0 2024-09-23 06:15:05,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=195300.0, ans=0.2 2024-09-23 06:15:08,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.99 vs. limit=15.0 2024-09-23 06:15:11,919 INFO [train.py:1198] (3/4) Epoch 11, batch 2900, loss[loss=0.2564, ctc_loss=0.1814, cr_loss=0.3752, over 17034.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1707, cr_loss=0.3812, over 3355863.18 frames. ], batch size: 56, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:15:20,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.93 vs. limit=10.0 2024-09-23 06:15:42,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=195440.0, ans=0.125 2024-09-23 06:16:14,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=195533.33333333334, ans=0.2 2024-09-23 06:16:21,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=195533.33333333334, ans=0.125 2024-09-23 06:16:31,839 INFO [train.py:1198] (3/4) Epoch 11, batch 2950, loss[loss=0.2365, ctc_loss=0.1601, cr_loss=0.382, over 17293.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1695, cr_loss=0.3796, over 3357868.46 frames. ], batch size: 46, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:16:57,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=195626.66666666666, ans=0.125 2024-09-23 06:17:07,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=195673.33333333334, ans=0.2 2024-09-23 06:17:14,842 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.272e+02 1.364e+02 1.479e+02 2.031e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-23 06:17:32,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=195720.0, ans=0.0 2024-09-23 06:17:50,965 INFO [train.py:1198] (3/4) Epoch 11, batch 3000, loss[loss=0.2583, ctc_loss=0.1788, cr_loss=0.3973, over 16999.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1691, cr_loss=0.3789, over 3349222.02 frames. ], batch size: 53, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:17:50,966 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 06:18:00,731 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6869, 4.1520, 3.8378, 4.4148, 3.9512, 3.6866, 3.8736, 3.7972], device='cuda:3') 2024-09-23 06:18:06,133 INFO [train.py:1230] (3/4) Epoch 11, validation: loss=0.04835, ctc_loss=0.04835, cr_loss=7.412e-15, over 944034.00 frames. 2024-09-23 06:18:06,134 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 06:18:11,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=195813.33333333334, ans=0.125 2024-09-23 06:18:23,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=195860.0, ans=0.125 2024-09-23 06:18:25,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=195860.0, ans=0.0 2024-09-23 06:18:51,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=195906.66666666666, ans=0.0 2024-09-23 06:18:51,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=195906.66666666666, ans=0.0 2024-09-23 06:19:21,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=196000.0, ans=0.2 2024-09-23 06:19:27,166 INFO [train.py:1198] (3/4) Epoch 11, batch 3050, loss[loss=0.2361, ctc_loss=0.1608, cr_loss=0.3763, over 17179.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.169, cr_loss=0.3781, over 3351965.50 frames. ], batch size: 41, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:19:44,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=196093.33333333334, ans=0.125 2024-09-23 06:19:45,712 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:20:14,520 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.336e+02 1.544e+02 1.822e+02 2.726e+02, threshold=3.088e+02, percent-clipped=0.0 2024-09-23 06:20:18,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=196186.66666666666, ans=0.0 2024-09-23 06:20:23,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=196186.66666666666, ans=0.0 2024-09-23 06:20:27,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=196186.66666666666, ans=0.125 2024-09-23 06:20:52,702 INFO [train.py:1198] (3/4) Epoch 11, batch 3100, loss[loss=0.3127, ctc_loss=0.2271, cr_loss=0.4279, over 14987.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1696, cr_loss=0.3792, over 3342962.03 frames. ], batch size: 89, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:20:55,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=12.0 2024-09-23 06:21:24,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=196373.33333333334, ans=0.125 2024-09-23 06:21:27,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196373.33333333334, ans=0.1 2024-09-23 06:21:30,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=196373.33333333334, ans=0.0 2024-09-23 06:21:40,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-09-23 06:21:43,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=196420.0, ans=0.2 2024-09-23 06:21:54,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-09-23 06:22:11,264 INFO [train.py:1198] (3/4) Epoch 11, batch 3150, loss[loss=0.2109, ctc_loss=0.1476, cr_loss=0.3166, over 17061.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1694, cr_loss=0.379, over 3344906.67 frames. ], batch size: 39, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:22:19,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=196513.33333333334, ans=0.0 2024-09-23 06:22:21,808 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.37 vs. limit=15.0 2024-09-23 06:22:25,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=196560.0, ans=0.125 2024-09-23 06:22:27,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.82 vs. limit=15.0 2024-09-23 06:22:33,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=196560.0, ans=0.0 2024-09-23 06:22:35,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-09-23 06:22:37,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=196560.0, ans=0.5 2024-09-23 06:22:37,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=196560.0, ans=0.125 2024-09-23 06:22:52,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=196606.66666666666, ans=0.0 2024-09-23 06:22:53,534 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.336e+02 1.469e+02 1.670e+02 2.912e+02, threshold=2.938e+02, percent-clipped=0.0 2024-09-23 06:23:03,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=196653.33333333334, ans=0.125 2024-09-23 06:23:08,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2024-09-23 06:23:09,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=196653.33333333334, ans=0.2 2024-09-23 06:23:29,552 INFO [train.py:1198] (3/4) Epoch 11, batch 3200, loss[loss=0.2676, ctc_loss=0.1884, cr_loss=0.3961, over 16699.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1693, cr_loss=0.3786, over 3347718.61 frames. ], batch size: 61, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:24:08,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=196840.0, ans=0.0 2024-09-23 06:24:15,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2024-09-23 06:24:39,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=196933.33333333334, ans=0.1 2024-09-23 06:24:41,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=196933.33333333334, ans=0.0 2024-09-23 06:24:47,354 INFO [train.py:1198] (3/4) Epoch 11, batch 3250, loss[loss=0.193, ctc_loss=0.1295, cr_loss=0.3177, over 17276.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1697, cr_loss=0.3795, over 3360756.13 frames. ], batch size: 42, lr: 1.08e-02, grad_scale: 16.0 2024-09-23 06:24:58,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=196980.0, ans=0.1 2024-09-23 06:25:04,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=197026.66666666666, ans=0.0 2024-09-23 06:25:21,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=197073.33333333334, ans=0.125 2024-09-23 06:25:30,819 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.294e+02 1.384e+02 1.561e+02 5.237e+02, threshold=2.769e+02, percent-clipped=1.0 2024-09-23 06:25:32,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=197120.0, ans=0.0 2024-09-23 06:25:37,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=197120.0, ans=0.0 2024-09-23 06:25:43,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.26 vs. limit=5.0 2024-09-23 06:25:49,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=197166.66666666666, ans=0.125 2024-09-23 06:25:51,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=197166.66666666666, ans=0.125 2024-09-23 06:26:05,133 INFO [train.py:1198] (3/4) Epoch 11, batch 3300, loss[loss=0.3194, ctc_loss=0.2403, cr_loss=0.3957, over 12027.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1696, cr_loss=0.379, over 3352237.62 frames. ], batch size: 123, lr: 1.08e-02, grad_scale: 16.0 2024-09-23 06:26:11,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=197213.33333333334, ans=0.0 2024-09-23 06:26:44,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=197306.66666666666, ans=0.125 2024-09-23 06:26:44,802 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2024-09-23 06:27:20,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=22.5 2024-09-23 06:27:22,861 INFO [train.py:1198] (3/4) Epoch 11, batch 3350, loss[loss=0.2771, ctc_loss=0.1932, cr_loss=0.4196, over 17000.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1705, cr_loss=0.3804, over 3350708.02 frames. ], batch size: 53, lr: 1.08e-02, grad_scale: 16.0 2024-09-23 06:27:30,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=197446.66666666666, ans=0.1 2024-09-23 06:27:37,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=197493.33333333334, ans=0.5 2024-09-23 06:27:42,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2024-09-23 06:27:44,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197493.33333333334, ans=0.1 2024-09-23 06:28:02,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=197540.0, ans=0.0 2024-09-23 06:28:06,586 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.310e+02 1.428e+02 1.594e+02 2.854e+02, threshold=2.856e+02, percent-clipped=1.0 2024-09-23 06:28:33,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=197633.33333333334, ans=0.0 2024-09-23 06:28:41,095 INFO [train.py:1198] (3/4) Epoch 11, batch 3400, loss[loss=0.2229, ctc_loss=0.1512, cr_loss=0.3581, over 17182.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.171, cr_loss=0.3812, over 3342301.34 frames. ], batch size: 41, lr: 1.08e-02, grad_scale: 16.0 2024-09-23 06:28:52,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=197680.0, ans=0.2 2024-09-23 06:28:55,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=197726.66666666666, ans=0.1 2024-09-23 06:29:00,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2024-09-23 06:29:10,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2024-09-23 06:29:20,552 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2024-09-23 06:30:04,327 INFO [train.py:1198] (3/4) Epoch 11, batch 3450, loss[loss=0.2005, ctc_loss=0.1356, cr_loss=0.3247, over 17087.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1694, cr_loss=0.379, over 3347570.96 frames. ], batch size: 43, lr: 1.08e-02, grad_scale: 16.0 2024-09-23 06:30:09,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=197913.33333333334, ans=0.05 2024-09-23 06:30:23,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=197960.0, ans=0.0 2024-09-23 06:30:29,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=197960.0, ans=0.125 2024-09-23 06:30:36,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=198006.66666666666, ans=0.125 2024-09-23 06:30:37,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=198006.66666666666, ans=0.125 2024-09-23 06:30:49,695 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.318e+02 1.398e+02 1.528e+02 2.343e+02, threshold=2.795e+02, percent-clipped=0.0 2024-09-23 06:30:49,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=198006.66666666666, ans=0.0 2024-09-23 06:31:22,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=198100.0, ans=0.025 2024-09-23 06:31:26,438 INFO [train.py:1198] (3/4) Epoch 11, batch 3500, loss[loss=0.2423, ctc_loss=0.1658, cr_loss=0.3824, over 17288.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1699, cr_loss=0.3803, over 3353491.03 frames. ], batch size: 46, lr: 1.08e-02, grad_scale: 16.0 2024-09-23 06:31:36,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=22.5 2024-09-23 06:32:00,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2024-09-23 06:32:04,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198240.0, ans=0.1 2024-09-23 06:32:07,207 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:32:34,047 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2024-09-23 06:32:44,007 INFO [train.py:1198] (3/4) Epoch 11, batch 3550, loss[loss=0.2216, ctc_loss=0.1495, cr_loss=0.3605, over 17025.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1694, cr_loss=0.38, over 3361489.49 frames. ], batch size: 44, lr: 1.08e-02, grad_scale: 16.0 2024-09-23 06:32:49,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=198380.0, ans=0.0 2024-09-23 06:32:58,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=198426.66666666666, ans=0.125 2024-09-23 06:33:01,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=198426.66666666666, ans=0.125 2024-09-23 06:33:21,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=198473.33333333334, ans=0.0 2024-09-23 06:33:23,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=198473.33333333334, ans=0.125 2024-09-23 06:33:27,727 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.313e+02 1.389e+02 1.597e+02 2.419e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-23 06:33:36,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198520.0, ans=0.1 2024-09-23 06:33:42,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=198520.0, ans=0.125 2024-09-23 06:33:42,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=198520.0, ans=0.125 2024-09-23 06:33:46,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=198566.66666666666, ans=10.0 2024-09-23 06:33:54,852 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:34:02,341 INFO [train.py:1198] (3/4) Epoch 11, batch 3600, loss[loss=0.2431, ctc_loss=0.1686, cr_loss=0.3726, over 17015.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1694, cr_loss=0.38, over 3366329.73 frames. ], batch size: 44, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:34:16,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=198660.0, ans=0.125 2024-09-23 06:35:04,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=198800.0, ans=0.0 2024-09-23 06:35:06,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=198800.0, ans=0.0 2024-09-23 06:35:21,047 INFO [train.py:1198] (3/4) Epoch 11, batch 3650, loss[loss=0.2084, ctc_loss=0.1403, cr_loss=0.3404, over 17240.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1686, cr_loss=0.3787, over 3357598.64 frames. ], batch size: 42, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:35:28,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=198846.66666666666, ans=0.2 2024-09-23 06:36:04,373 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.052e+02 1.307e+02 1.406e+02 1.514e+02 2.450e+02, threshold=2.812e+02, percent-clipped=0.0 2024-09-23 06:36:18,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=198986.66666666666, ans=0.125 2024-09-23 06:36:20,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198986.66666666666, ans=0.1 2024-09-23 06:36:39,291 INFO [train.py:1198] (3/4) Epoch 11, batch 3700, loss[loss=0.2758, ctc_loss=0.1902, cr_loss=0.4276, over 14882.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1685, cr_loss=0.3781, over 3342149.65 frames. ], batch size: 89, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:36:55,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=199126.66666666666, ans=0.0 2024-09-23 06:36:57,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2024-09-23 06:37:04,864 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.26 vs. limit=15.0 2024-09-23 06:37:14,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-09-23 06:37:37,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=199220.0, ans=0.0 2024-09-23 06:37:57,660 INFO [train.py:1198] (3/4) Epoch 11, batch 3750, loss[loss=0.3079, ctc_loss=0.2227, cr_loss=0.426, over 11581.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1681, cr_loss=0.3769, over 3338130.09 frames. ], batch size: 123, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:38:41,140 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.417e+02 1.573e+02 1.870e+02 3.069e+02, threshold=3.146e+02, percent-clipped=3.0 2024-09-23 06:38:41,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=199406.66666666666, ans=0.0 2024-09-23 06:38:44,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=199453.33333333334, ans=0.125 2024-09-23 06:39:16,470 INFO [train.py:1198] (3/4) Epoch 11, batch 3800, loss[loss=0.2286, ctc_loss=0.1574, cr_loss=0.3557, over 17148.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1691, cr_loss=0.378, over 3330525.73 frames. ], batch size: 48, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:40:34,883 INFO [train.py:1198] (3/4) Epoch 11, batch 3850, loss[loss=0.317, ctc_loss=0.231, cr_loss=0.4302, over 11958.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1733, cr_loss=0.3825, over 3276954.39 frames. ], batch size: 124, lr: 1.07e-02, grad_scale: 16.0 2024-09-23 06:40:48,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=199826.66666666666, ans=0.0 2024-09-23 06:41:02,808 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2024-09-23 06:41:06,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=199873.33333333334, ans=0.0 2024-09-23 06:41:08,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=199873.33333333334, ans=0.125 2024-09-23 06:41:18,606 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.202e+02 1.500e+02 1.672e+02 1.857e+02 2.511e+02, threshold=3.343e+02, percent-clipped=0.0 2024-09-23 06:42:36,536 INFO [train.py:1198] (3/4) Epoch 12, batch 0, loss[loss=0.2145, ctc_loss=0.1478, cr_loss=0.3338, over 17103.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1478, cr_loss=0.3338, over 17103.00 frames. ], batch size: 40, lr: 1.03e-02, grad_scale: 32.0 2024-09-23 06:42:36,537 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 06:42:43,747 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7759, 3.7055, 3.5131, 3.4553], device='cuda:3') 2024-09-23 06:42:52,077 INFO [train.py:1230] (3/4) Epoch 12, validation: loss=0.0478, ctc_loss=0.0478, cr_loss=7.52e-15, over 944034.00 frames. 2024-09-23 06:42:52,078 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 06:43:08,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=200041.33333333334, ans=0.2 2024-09-23 06:43:22,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=200088.0, ans=0.1 2024-09-23 06:43:47,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=200134.66666666666, ans=0.0 2024-09-23 06:43:52,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.33 vs. limit=15.0 2024-09-23 06:44:04,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2024-09-23 06:44:11,656 INFO [train.py:1198] (3/4) Epoch 12, batch 50, loss[loss=0.1947, ctc_loss=0.1315, cr_loss=0.3158, over 17190.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1643, cr_loss=0.3738, over 746718.47 frames. ], batch size: 41, lr: 1.03e-02, grad_scale: 32.0 2024-09-23 06:44:13,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=200228.0, ans=0.0 2024-09-23 06:44:23,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.05 vs. limit=10.0 2024-09-23 06:44:46,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=200274.66666666666, ans=0.125 2024-09-23 06:45:12,534 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.318e+02 1.438e+02 1.597e+02 2.419e+02, threshold=2.876e+02, percent-clipped=0.0 2024-09-23 06:45:24,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=200414.66666666666, ans=0.125 2024-09-23 06:45:31,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=200414.66666666666, ans=0.015 2024-09-23 06:45:40,751 INFO [train.py:1198] (3/4) Epoch 12, batch 100, loss[loss=0.3039, ctc_loss=0.2113, cr_loss=0.463, over 16986.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1663, cr_loss=0.3781, over 1326934.18 frames. ], batch size: 53, lr: 1.03e-02, grad_scale: 16.0 2024-09-23 06:45:59,245 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.77 vs. limit=22.5 2024-09-23 06:46:09,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=22.5 2024-09-23 06:46:10,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=200508.0, ans=0.0 2024-09-23 06:46:10,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=200508.0, ans=0.125 2024-09-23 06:46:54,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.87 vs. limit=6.0 2024-09-23 06:47:00,676 INFO [train.py:1198] (3/4) Epoch 12, batch 150, loss[loss=0.216, ctc_loss=0.1475, cr_loss=0.3427, over 16276.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1672, cr_loss=0.3784, over 1772336.52 frames. ], batch size: 36, lr: 1.03e-02, grad_scale: 16.0 2024-09-23 06:47:08,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=200694.66666666666, ans=0.025 2024-09-23 06:47:11,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=200694.66666666666, ans=0.015 2024-09-23 06:47:23,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=200741.33333333334, ans=0.025 2024-09-23 06:47:39,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=200788.0, ans=0.125 2024-09-23 06:47:45,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=200788.0, ans=0.0 2024-09-23 06:47:54,508 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.306e+02 1.461e+02 1.660e+02 2.321e+02, threshold=2.923e+02, percent-clipped=0.0 2024-09-23 06:48:19,936 INFO [train.py:1198] (3/4) Epoch 12, batch 200, loss[loss=0.2914, ctc_loss=0.2026, cr_loss=0.4442, over 17028.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1678, cr_loss=0.3801, over 2127500.89 frames. ], batch size: 56, lr: 1.03e-02, grad_scale: 16.0 2024-09-23 06:48:37,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=200974.66666666666, ans=0.125 2024-09-23 06:49:17,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=201068.0, ans=0.05 2024-09-23 06:49:17,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=201068.0, ans=0.125 2024-09-23 06:49:18,868 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:49:26,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=201114.66666666666, ans=0.2 2024-09-23 06:49:28,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=201114.66666666666, ans=10.0 2024-09-23 06:49:47,799 INFO [train.py:1198] (3/4) Epoch 12, batch 250, loss[loss=0.2881, ctc_loss=0.2026, cr_loss=0.4274, over 15107.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1679, cr_loss=0.3794, over 2391137.35 frames. ], batch size: 89, lr: 1.03e-02, grad_scale: 16.0 2024-09-23 06:49:48,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=201161.33333333334, ans=0.125 2024-09-23 06:49:48,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=201161.33333333334, ans=0.025 2024-09-23 06:50:04,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=201208.0, ans=0.2 2024-09-23 06:50:39,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=201301.33333333334, ans=0.95 2024-09-23 06:50:45,029 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.293e+02 1.367e+02 1.506e+02 2.268e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-23 06:50:49,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-09-23 06:50:51,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=201301.33333333334, ans=0.125 2024-09-23 06:50:55,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=201348.0, ans=0.07 2024-09-23 06:51:04,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-09-23 06:51:10,810 INFO [train.py:1198] (3/4) Epoch 12, batch 300, loss[loss=0.2343, ctc_loss=0.1599, cr_loss=0.3723, over 17176.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1676, cr_loss=0.3788, over 2613553.20 frames. ], batch size: 41, lr: 1.03e-02, grad_scale: 16.0 2024-09-23 06:51:18,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=201394.66666666666, ans=0.025 2024-09-23 06:52:02,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=201534.66666666666, ans=0.125 2024-09-23 06:52:19,935 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.45 vs. limit=15.0 2024-09-23 06:52:30,565 INFO [train.py:1198] (3/4) Epoch 12, batch 350, loss[loss=0.2228, ctc_loss=0.1515, cr_loss=0.3565, over 16242.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.166, cr_loss=0.3774, over 2779959.78 frames. ], batch size: 36, lr: 1.02e-02, grad_scale: 16.0 2024-09-23 06:53:11,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=201721.33333333334, ans=0.1 2024-09-23 06:53:24,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=201768.0, ans=0.125 2024-09-23 06:53:25,562 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.271e+02 1.388e+02 1.541e+02 2.258e+02, threshold=2.777e+02, percent-clipped=0.0 2024-09-23 06:53:28,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=12.0 2024-09-23 06:53:41,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=201814.66666666666, ans=0.125 2024-09-23 06:53:51,107 INFO [train.py:1198] (3/4) Epoch 12, batch 400, loss[loss=0.2347, ctc_loss=0.1643, cr_loss=0.352, over 17264.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1653, cr_loss=0.3755, over 2909162.54 frames. ], batch size: 42, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 06:54:29,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=201954.66666666666, ans=0.0 2024-09-23 06:54:49,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=202001.33333333334, ans=0.2 2024-09-23 06:54:52,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-23 06:55:17,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=202094.66666666666, ans=0.035 2024-09-23 06:55:18,828 INFO [train.py:1198] (3/4) Epoch 12, batch 450, loss[loss=0.2178, ctc_loss=0.1481, cr_loss=0.3487, over 17320.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1654, cr_loss=0.3757, over 3017601.60 frames. ], batch size: 46, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 06:56:10,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=202234.66666666666, ans=0.125 2024-09-23 06:56:11,166 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:56:15,456 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.330e+02 1.519e+02 1.735e+02 3.092e+02, threshold=3.038e+02, percent-clipped=2.0 2024-09-23 06:56:25,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=202281.33333333334, ans=0.125 2024-09-23 06:56:28,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=202281.33333333334, ans=0.125 2024-09-23 06:56:40,846 INFO [train.py:1198] (3/4) Epoch 12, batch 500, loss[loss=0.2266, ctc_loss=0.1525, cr_loss=0.3703, over 17311.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1657, cr_loss=0.3766, over 3102225.83 frames. ], batch size: 49, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 06:56:55,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=202374.66666666666, ans=0.0 2024-09-23 06:57:00,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=202374.66666666666, ans=0.2 2024-09-23 06:57:00,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=202374.66666666666, ans=0.05 2024-09-23 06:57:05,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=202374.66666666666, ans=0.125 2024-09-23 06:57:11,404 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:57:25,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=202421.33333333334, ans=0.125 2024-09-23 06:57:32,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=202468.0, ans=0.025 2024-09-23 06:58:00,477 INFO [train.py:1198] (3/4) Epoch 12, batch 550, loss[loss=0.1948, ctc_loss=0.1325, cr_loss=0.3115, over 17025.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1664, cr_loss=0.3775, over 3162118.25 frames. ], batch size: 39, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 06:58:00,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=202561.33333333334, ans=0.0 2024-09-23 06:58:05,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=202561.33333333334, ans=0.125 2024-09-23 06:58:11,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=202561.33333333334, ans=0.125 2024-09-23 06:58:15,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.76 vs. limit=22.5 2024-09-23 06:58:53,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=202701.33333333334, ans=0.125 2024-09-23 06:58:54,272 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.042e+02 1.321e+02 1.421e+02 1.522e+02 2.085e+02, threshold=2.842e+02, percent-clipped=0.0 2024-09-23 06:59:01,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=202701.33333333334, ans=0.125 2024-09-23 06:59:02,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=202748.0, ans=0.0 2024-09-23 06:59:10,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=202748.0, ans=0.2 2024-09-23 06:59:16,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=202748.0, ans=15.0 2024-09-23 06:59:22,369 INFO [train.py:1198] (3/4) Epoch 12, batch 600, loss[loss=0.2078, ctc_loss=0.143, cr_loss=0.3244, over 17082.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.166, cr_loss=0.3764, over 3209434.05 frames. ], batch size: 43, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 06:59:40,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=202794.66666666666, ans=0.0 2024-09-23 07:00:03,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=202888.0, ans=0.125 2024-09-23 07:00:03,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=202888.0, ans=0.125 2024-09-23 07:00:09,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2024-09-23 07:00:28,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=202934.66666666666, ans=0.1 2024-09-23 07:00:29,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-09-23 07:00:34,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=202981.33333333334, ans=0.0 2024-09-23 07:00:50,044 INFO [train.py:1198] (3/4) Epoch 12, batch 650, loss[loss=0.2437, ctc_loss=0.1657, cr_loss=0.3898, over 17177.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1667, cr_loss=0.3779, over 3241579.99 frames. ], batch size: 45, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:00:50,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.97 vs. limit=15.0 2024-09-23 07:00:55,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=12.0 2024-09-23 07:00:56,960 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 07:01:09,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=203074.66666666666, ans=0.0 2024-09-23 07:01:15,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=203074.66666666666, ans=0.125 2024-09-23 07:01:44,226 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.271e+02 1.387e+02 1.567e+02 2.285e+02, threshold=2.773e+02, percent-clipped=0.0 2024-09-23 07:01:45,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.53 vs. limit=22.5 2024-09-23 07:01:58,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=203214.66666666666, ans=0.1 2024-09-23 07:02:09,753 INFO [train.py:1198] (3/4) Epoch 12, batch 700, loss[loss=0.2073, ctc_loss=0.1394, cr_loss=0.3393, over 17289.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1666, cr_loss=0.3786, over 3278051.47 frames. ], batch size: 42, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:02:13,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=203261.33333333334, ans=0.1 2024-09-23 07:02:46,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=203354.66666666666, ans=0.125 2024-09-23 07:02:50,132 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 07:02:58,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=203401.33333333334, ans=0.125 2024-09-23 07:03:17,151 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 07:03:23,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=203448.0, ans=0.125 2024-09-23 07:03:29,846 INFO [train.py:1198] (3/4) Epoch 12, batch 750, loss[loss=0.2038, ctc_loss=0.1373, cr_loss=0.3324, over 17102.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1668, cr_loss=0.3785, over 3287732.89 frames. ], batch size: 43, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:04:29,347 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.298e+02 1.398e+02 1.549e+02 2.740e+02, threshold=2.796e+02, percent-clipped=0.0 2024-09-23 07:04:41,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=203681.33333333334, ans=0.0 2024-09-23 07:04:57,397 INFO [train.py:1198] (3/4) Epoch 12, batch 800, loss[loss=0.2204, ctc_loss=0.1515, cr_loss=0.3447, over 17174.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1656, cr_loss=0.3763, over 3309565.44 frames. ], batch size: 41, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:05:15,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-23 07:05:22,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=203774.66666666666, ans=0.1 2024-09-23 07:05:27,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=203774.66666666666, ans=0.05 2024-09-23 07:05:29,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.96 vs. limit=22.5 2024-09-23 07:05:46,489 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-09-23 07:06:03,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=203914.66666666666, ans=0.125 2024-09-23 07:06:18,740 INFO [train.py:1198] (3/4) Epoch 12, batch 850, loss[loss=0.234, ctc_loss=0.165, cr_loss=0.3447, over 17163.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1661, cr_loss=0.3765, over 3323308.15 frames. ], batch size: 45, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:06:29,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.36 vs. limit=15.0 2024-09-23 07:06:33,374 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 07:06:49,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=204054.66666666666, ans=0.0 2024-09-23 07:07:12,568 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.288e+02 1.402e+02 1.556e+02 2.231e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-23 07:07:29,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.14 vs. limit=15.0 2024-09-23 07:07:33,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=204148.0, ans=0.2 2024-09-23 07:07:38,076 INFO [train.py:1198] (3/4) Epoch 12, batch 900, loss[loss=0.2374, ctc_loss=0.1628, cr_loss=0.3728, over 17143.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1659, cr_loss=0.3766, over 3337306.02 frames. ], batch size: 48, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:08:06,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=204241.33333333334, ans=0.2 2024-09-23 07:08:21,971 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2024-09-23 07:08:23,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-09-23 07:08:29,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=204334.66666666666, ans=0.2 2024-09-23 07:08:34,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=22.5 2024-09-23 07:08:35,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=204334.66666666666, ans=0.0 2024-09-23 07:08:53,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=204381.33333333334, ans=0.125 2024-09-23 07:08:57,661 INFO [train.py:1198] (3/4) Epoch 12, batch 950, loss[loss=0.2688, ctc_loss=0.1865, cr_loss=0.4115, over 17232.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1658, cr_loss=0.3772, over 3348803.30 frames. ], batch size: 50, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:09:20,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=204474.66666666666, ans=0.1 2024-09-23 07:09:25,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=204474.66666666666, ans=0.0 2024-09-23 07:09:38,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=204521.33333333334, ans=0.2 2024-09-23 07:09:40,999 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2024-09-23 07:09:47,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=204521.33333333334, ans=0.125 2024-09-23 07:09:51,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=204568.0, ans=0.2 2024-09-23 07:09:55,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=204568.0, ans=0.0 2024-09-23 07:09:59,535 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.295e+02 1.392e+02 1.548e+02 2.202e+02, threshold=2.785e+02, percent-clipped=0.0 2024-09-23 07:10:19,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=204614.66666666666, ans=0.125 2024-09-23 07:10:28,006 INFO [train.py:1198] (3/4) Epoch 12, batch 1000, loss[loss=0.2319, ctc_loss=0.1598, cr_loss=0.3608, over 15926.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1663, cr_loss=0.3777, over 3344226.41 frames. ], batch size: 35, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:10:29,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=204661.33333333334, ans=0.1 2024-09-23 07:11:01,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=204754.66666666666, ans=0.125 2024-09-23 07:11:33,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-09-23 07:11:39,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=204848.0, ans=0.0 2024-09-23 07:11:47,282 INFO [train.py:1198] (3/4) Epoch 12, batch 1050, loss[loss=0.3235, ctc_loss=0.241, cr_loss=0.4126, over 11318.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1671, cr_loss=0.3788, over 3337164.37 frames. ], batch size: 123, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:11:50,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=204894.66666666666, ans=0.2 2024-09-23 07:12:22,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=204988.0, ans=0.0 2024-09-23 07:12:41,478 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.299e+02 1.476e+02 1.734e+02 2.912e+02, threshold=2.951e+02, percent-clipped=1.0 2024-09-23 07:13:02,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=205081.33333333334, ans=0.125 2024-09-23 07:13:07,027 INFO [train.py:1198] (3/4) Epoch 12, batch 1100, loss[loss=0.2663, ctc_loss=0.1798, cr_loss=0.4324, over 16465.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1676, cr_loss=0.3796, over 3336580.44 frames. ], batch size: 66, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:14:13,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.03 vs. limit=10.0 2024-09-23 07:14:16,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.13 vs. limit=15.0 2024-09-23 07:14:30,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=205314.66666666666, ans=0.07 2024-09-23 07:14:32,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.47 vs. limit=15.0 2024-09-23 07:14:36,519 INFO [train.py:1198] (3/4) Epoch 12, batch 1150, loss[loss=0.2366, ctc_loss=0.1648, cr_loss=0.3592, over 17360.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1672, cr_loss=0.3787, over 3337563.49 frames. ], batch size: 48, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:15:00,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=205408.0, ans=0.05 2024-09-23 07:15:09,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=205454.66666666666, ans=0.0 2024-09-23 07:15:32,629 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.321e+02 1.590e+02 1.805e+02 2.687e+02, threshold=3.179e+02, percent-clipped=0.0 2024-09-23 07:15:34,932 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.28 vs. limit=12.0 2024-09-23 07:15:42,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=205548.0, ans=0.0 2024-09-23 07:15:50,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=205548.0, ans=0.0 2024-09-23 07:15:58,029 INFO [train.py:1198] (3/4) Epoch 12, batch 1200, loss[loss=0.207, ctc_loss=0.1427, cr_loss=0.3217, over 17271.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1664, cr_loss=0.3769, over 3343264.78 frames. ], batch size: 42, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:16:58,873 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2024-09-23 07:17:17,303 INFO [train.py:1198] (3/4) Epoch 12, batch 1250, loss[loss=0.2735, ctc_loss=0.1886, cr_loss=0.4241, over 17042.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1658, cr_loss=0.3758, over 3337757.78 frames. ], batch size: 52, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:17:47,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=205921.33333333334, ans=0.0 2024-09-23 07:17:50,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=205921.33333333334, ans=0.125 2024-09-23 07:18:02,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2024-09-23 07:18:12,962 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.281e+02 1.384e+02 1.564e+02 2.899e+02, threshold=2.767e+02, percent-clipped=0.0 2024-09-23 07:18:26,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=206014.66666666666, ans=0.1 2024-09-23 07:18:29,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=206014.66666666666, ans=0.0 2024-09-23 07:18:29,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=206014.66666666666, ans=0.125 2024-09-23 07:18:36,852 INFO [train.py:1198] (3/4) Epoch 12, batch 1300, loss[loss=0.2696, ctc_loss=0.1854, cr_loss=0.4212, over 16498.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1652, cr_loss=0.3757, over 3349373.56 frames. ], batch size: 66, lr: 1.01e-02, grad_scale: 16.0 2024-09-23 07:18:45,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.91 vs. limit=15.0 2024-09-23 07:18:53,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=22.5 2024-09-23 07:19:07,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-09-23 07:19:18,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=206154.66666666666, ans=0.125 2024-09-23 07:19:49,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=206248.0, ans=0.125 2024-09-23 07:19:56,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=206248.0, ans=0.125 2024-09-23 07:19:57,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=206248.0, ans=0.125 2024-09-23 07:20:03,898 INFO [train.py:1198] (3/4) Epoch 12, batch 1350, loss[loss=0.2462, ctc_loss=0.174, cr_loss=0.3609, over 15973.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1649, cr_loss=0.3743, over 3354050.69 frames. ], batch size: 74, lr: 1.01e-02, grad_scale: 16.0 2024-09-23 07:20:35,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206341.33333333334, ans=0.1 2024-09-23 07:21:02,411 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.350e+02 1.483e+02 1.734e+02 2.832e+02, threshold=2.966e+02, percent-clipped=2.0 2024-09-23 07:21:04,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=206434.66666666666, ans=0.125 2024-09-23 07:21:26,079 INFO [train.py:1198] (3/4) Epoch 12, batch 1400, loss[loss=0.2385, ctc_loss=0.1642, cr_loss=0.371, over 17215.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1656, cr_loss=0.3759, over 3355956.92 frames. ], batch size: 50, lr: 1.01e-02, grad_scale: 16.0 2024-09-23 07:21:42,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=206574.66666666666, ans=0.125 2024-09-23 07:22:04,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206621.33333333334, ans=0.1 2024-09-23 07:22:25,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=206668.0, ans=0.0 2024-09-23 07:22:46,291 INFO [train.py:1198] (3/4) Epoch 12, batch 1450, loss[loss=0.2459, ctc_loss=0.1698, cr_loss=0.3805, over 16508.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1659, cr_loss=0.3763, over 3358577.09 frames. ], batch size: 66, lr: 1.01e-02, grad_scale: 16.0 2024-09-23 07:23:13,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=206808.0, ans=0.0 2024-09-23 07:23:17,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.96 vs. limit=15.0 2024-09-23 07:23:34,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=206901.33333333334, ans=0.025 2024-09-23 07:23:38,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2024-09-23 07:23:42,179 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.321e+02 1.435e+02 1.538e+02 2.142e+02, threshold=2.870e+02, percent-clipped=0.0 2024-09-23 07:23:54,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=206948.0, ans=0.125 2024-09-23 07:24:10,785 INFO [train.py:1198] (3/4) Epoch 12, batch 1500, loss[loss=0.2212, ctc_loss=0.1501, cr_loss=0.3558, over 17297.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1665, cr_loss=0.3773, over 3359489.09 frames. ], batch size: 46, lr: 1.01e-02, grad_scale: 16.0 2024-09-23 07:24:13,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=15.0 2024-09-23 07:24:54,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=207088.0, ans=0.2 2024-09-23 07:25:01,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=207134.66666666666, ans=0.125 2024-09-23 07:25:16,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.15 vs. limit=15.0 2024-09-23 07:25:35,348 INFO [train.py:1198] (3/4) Epoch 12, batch 1550, loss[loss=0.237, ctc_loss=0.1627, cr_loss=0.3715, over 17296.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1665, cr_loss=0.3771, over 3359500.43 frames. ], batch size: 51, lr: 1.01e-02, grad_scale: 16.0 2024-09-23 07:25:35,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=207228.0, ans=0.1 2024-09-23 07:25:39,252 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=15.0 2024-09-23 07:25:58,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=207274.66666666666, ans=0.125 2024-09-23 07:25:59,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=207274.66666666666, ans=0.125 2024-09-23 07:26:05,360 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2024-09-23 07:26:13,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.03 vs. limit=10.0 2024-09-23 07:26:16,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=207321.33333333334, ans=0.025 2024-09-23 07:26:31,712 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.334e+02 1.473e+02 1.633e+02 2.267e+02, threshold=2.947e+02, percent-clipped=0.0 2024-09-23 07:26:55,794 INFO [train.py:1198] (3/4) Epoch 12, batch 1600, loss[loss=0.1875, ctc_loss=0.1313, cr_loss=0.2805, over 17207.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1658, cr_loss=0.3752, over 3362424.78 frames. ], batch size: 41, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:27:09,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.35 vs. limit=12.0 2024-09-23 07:27:28,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=207554.66666666666, ans=0.125 2024-09-23 07:27:42,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=207601.33333333334, ans=0.0 2024-09-23 07:28:01,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=207648.0, ans=0.1 2024-09-23 07:28:07,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=207648.0, ans=0.0 2024-09-23 07:28:15,592 INFO [train.py:1198] (3/4) Epoch 12, batch 1650, loss[loss=0.2451, ctc_loss=0.1675, cr_loss=0.3879, over 17217.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1656, cr_loss=0.3756, over 3365988.92 frames. ], batch size: 50, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:28:42,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=207741.33333333334, ans=0.5 2024-09-23 07:28:44,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=207741.33333333334, ans=0.025 2024-09-23 07:28:56,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=207788.0, ans=0.125 2024-09-23 07:28:58,746 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.71 vs. limit=6.0 2024-09-23 07:29:12,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=207834.66666666666, ans=0.125 2024-09-23 07:29:15,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=207834.66666666666, ans=0.0 2024-09-23 07:29:16,236 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.260e+02 1.353e+02 1.462e+02 2.073e+02, threshold=2.705e+02, percent-clipped=0.0 2024-09-23 07:29:42,768 INFO [train.py:1198] (3/4) Epoch 12, batch 1700, loss[loss=0.2873, ctc_loss=0.2015, cr_loss=0.4288, over 16497.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1652, cr_loss=0.3754, over 3366859.78 frames. ], batch size: 66, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:31:06,224 INFO [train.py:1198] (3/4) Epoch 12, batch 1750, loss[loss=0.2609, ctc_loss=0.1839, cr_loss=0.3847, over 17213.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1648, cr_loss=0.375, over 3366873.05 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:31:22,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=208208.0, ans=0.125 2024-09-23 07:31:40,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=208254.66666666666, ans=0.2 2024-09-23 07:32:02,198 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.329e+02 1.442e+02 1.637e+02 2.396e+02, threshold=2.884e+02, percent-clipped=0.0 2024-09-23 07:32:12,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=208348.0, ans=0.125 2024-09-23 07:32:16,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=208348.0, ans=0.125 2024-09-23 07:32:26,010 INFO [train.py:1198] (3/4) Epoch 12, batch 1800, loss[loss=0.2537, ctc_loss=0.1753, cr_loss=0.3921, over 16596.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1644, cr_loss=0.3746, over 3371249.45 frames. ], batch size: 66, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:32:29,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=66.63 vs. limit=15.0 2024-09-23 07:33:07,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=208488.0, ans=0.125 2024-09-23 07:33:19,822 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.33 vs. limit=10.0 2024-09-23 07:33:27,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-09-23 07:33:34,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=208581.33333333334, ans=0.2 2024-09-23 07:33:45,652 INFO [train.py:1198] (3/4) Epoch 12, batch 1850, loss[loss=0.2558, ctc_loss=0.179, cr_loss=0.3838, over 17019.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.166, cr_loss=0.3764, over 3349847.71 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:33:46,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.66 vs. limit=15.0 2024-09-23 07:34:04,456 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.15 vs. limit=10.0 2024-09-23 07:34:20,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=208674.66666666666, ans=0.0 2024-09-23 07:34:25,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=208721.33333333334, ans=0.2 2024-09-23 07:34:39,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=208768.0, ans=0.0 2024-09-23 07:34:48,756 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 1.290e+02 1.399e+02 1.526e+02 2.337e+02, threshold=2.797e+02, percent-clipped=0.0 2024-09-23 07:34:50,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=208768.0, ans=0.125 2024-09-23 07:34:52,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=208768.0, ans=0.125 2024-09-23 07:34:52,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2024-09-23 07:34:55,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=208814.66666666666, ans=0.0 2024-09-23 07:35:00,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=208814.66666666666, ans=0.125 2024-09-23 07:35:09,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2024-09-23 07:35:15,127 INFO [train.py:1198] (3/4) Epoch 12, batch 1900, loss[loss=0.2639, ctc_loss=0.1825, cr_loss=0.4068, over 17213.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1655, cr_loss=0.3759, over 3359582.52 frames. ], batch size: 50, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:35:18,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=208861.33333333334, ans=0.0 2024-09-23 07:35:20,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=208861.33333333334, ans=0.2 2024-09-23 07:35:25,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.45 vs. limit=22.5 2024-09-23 07:35:31,552 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=22.5 2024-09-23 07:35:32,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=208908.0, ans=0.125 2024-09-23 07:35:39,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2024-09-23 07:35:53,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=208954.66666666666, ans=0.0 2024-09-23 07:36:28,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=209048.0, ans=0.0 2024-09-23 07:36:34,862 INFO [train.py:1198] (3/4) Epoch 12, batch 1950, loss[loss=0.2501, ctc_loss=0.1716, cr_loss=0.3923, over 17137.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1656, cr_loss=0.3759, over 3360358.47 frames. ], batch size: 48, lr: 1.01e-02, grad_scale: 16.0 2024-09-23 07:36:47,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=209094.66666666666, ans=0.125 2024-09-23 07:36:48,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=15.0 2024-09-23 07:37:02,063 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 07:37:06,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=209188.0, ans=0.125 2024-09-23 07:37:30,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=209234.66666666666, ans=0.1 2024-09-23 07:37:31,805 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.280e+02 1.393e+02 1.533e+02 3.563e+02, threshold=2.786e+02, percent-clipped=1.0 2024-09-23 07:37:32,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=209234.66666666666, ans=0.125 2024-09-23 07:37:33,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=209234.66666666666, ans=0.0 2024-09-23 07:37:34,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2024-09-23 07:37:48,291 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2024-09-23 07:37:54,117 INFO [train.py:1198] (3/4) Epoch 12, batch 2000, loss[loss=0.2433, ctc_loss=0.1655, cr_loss=0.3892, over 17204.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1648, cr_loss=0.3749, over 3359447.62 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:38:15,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=209374.66666666666, ans=0.125 2024-09-23 07:38:29,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=209421.33333333334, ans=0.0 2024-09-23 07:39:01,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=209514.66666666666, ans=0.2 2024-09-23 07:39:18,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=209514.66666666666, ans=0.0 2024-09-23 07:39:21,411 INFO [train.py:1198] (3/4) Epoch 12, batch 2050, loss[loss=0.2198, ctc_loss=0.1464, cr_loss=0.3672, over 17131.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1651, cr_loss=0.3757, over 3368090.16 frames. ], batch size: 40, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:39:54,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=209654.66666666666, ans=0.0 2024-09-23 07:40:03,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=209654.66666666666, ans=0.5 2024-09-23 07:40:04,759 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.52 vs. limit=22.5 2024-09-23 07:40:21,041 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.313e+02 1.485e+02 1.661e+02 2.450e+02, threshold=2.969e+02, percent-clipped=0.0 2024-09-23 07:40:26,264 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 07:40:29,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=209748.0, ans=0.0 2024-09-23 07:40:43,462 INFO [train.py:1198] (3/4) Epoch 12, batch 2100, loss[loss=0.2565, ctc_loss=0.1765, cr_loss=0.3998, over 17225.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1654, cr_loss=0.3766, over 3366300.78 frames. ], batch size: 50, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:40:50,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2024-09-23 07:40:53,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=209794.66666666666, ans=0.2 2024-09-23 07:41:11,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=15.0 2024-09-23 07:41:33,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=209934.66666666666, ans=0.1 2024-09-23 07:41:37,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=209934.66666666666, ans=0.05 2024-09-23 07:41:55,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=209981.33333333334, ans=0.0 2024-09-23 07:42:03,575 INFO [train.py:1198] (3/4) Epoch 12, batch 2150, loss[loss=0.2752, ctc_loss=0.1951, cr_loss=0.4004, over 11751.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1658, cr_loss=0.3771, over 3369384.84 frames. ], batch size: 123, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:42:12,245 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2024-09-23 07:42:33,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=210121.33333333334, ans=0.125 2024-09-23 07:42:49,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=210168.0, ans=0.125 2024-09-23 07:42:51,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2024-09-23 07:43:00,522 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.318e+02 1.400e+02 1.574e+02 2.180e+02, threshold=2.800e+02, percent-clipped=0.0 2024-09-23 07:43:22,706 INFO [train.py:1198] (3/4) Epoch 12, batch 2200, loss[loss=0.2344, ctc_loss=0.1617, cr_loss=0.3635, over 17049.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1658, cr_loss=0.3774, over 3363951.94 frames. ], batch size: 39, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:43:32,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=210261.33333333334, ans=0.125 2024-09-23 07:43:56,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=210308.0, ans=0.125 2024-09-23 07:44:01,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=210354.66666666666, ans=0.0 2024-09-23 07:44:04,679 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 07:44:04,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=210354.66666666666, ans=0.0 2024-09-23 07:44:08,320 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2024-09-23 07:44:16,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=210401.33333333334, ans=0.02 2024-09-23 07:44:32,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=210448.0, ans=0.025 2024-09-23 07:44:33,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=210448.0, ans=15.0 2024-09-23 07:44:34,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=210448.0, ans=0.1 2024-09-23 07:44:40,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=210448.0, ans=0.125 2024-09-23 07:44:50,075 INFO [train.py:1198] (3/4) Epoch 12, batch 2250, loss[loss=0.2715, ctc_loss=0.1898, cr_loss=0.4083, over 14879.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1668, cr_loss=0.3783, over 3358741.68 frames. ], batch size: 89, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:45:01,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.77 vs. limit=10.0 2024-09-23 07:45:37,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=210588.0, ans=0.0 2024-09-23 07:45:43,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=210634.66666666666, ans=0.0 2024-09-23 07:45:46,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=210634.66666666666, ans=0.2 2024-09-23 07:45:49,562 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.306e+02 1.486e+02 1.708e+02 2.318e+02, threshold=2.971e+02, percent-clipped=0.0 2024-09-23 07:46:00,294 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-23 07:46:04,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=210681.33333333334, ans=0.1 2024-09-23 07:46:07,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=210681.33333333334, ans=0.0 2024-09-23 07:46:11,908 INFO [train.py:1198] (3/4) Epoch 12, batch 2300, loss[loss=0.2546, ctc_loss=0.1788, cr_loss=0.3795, over 16998.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1669, cr_loss=0.3786, over 3356169.83 frames. ], batch size: 53, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:46:14,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=15.0 2024-09-23 07:46:23,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=210728.0, ans=0.05 2024-09-23 07:46:53,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=210821.33333333334, ans=0.125 2024-09-23 07:46:57,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.87 vs. limit=22.5 2024-09-23 07:47:17,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=210914.66666666666, ans=0.125 2024-09-23 07:47:27,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=210914.66666666666, ans=0.0 2024-09-23 07:47:32,371 INFO [train.py:1198] (3/4) Epoch 12, batch 2350, loss[loss=0.2124, ctc_loss=0.1437, cr_loss=0.3431, over 17034.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.165, cr_loss=0.3766, over 3367405.94 frames. ], batch size: 44, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:48:12,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=211054.66666666666, ans=0.125 2024-09-23 07:48:15,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=211054.66666666666, ans=0.1 2024-09-23 07:48:17,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=211054.66666666666, ans=0.025 2024-09-23 07:48:24,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=211101.33333333334, ans=0.125 2024-09-23 07:48:29,121 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.329e+02 1.457e+02 1.570e+02 2.040e+02, threshold=2.915e+02, percent-clipped=0.0 2024-09-23 07:48:56,507 INFO [train.py:1198] (3/4) Epoch 12, batch 2400, loss[loss=0.2787, ctc_loss=0.2008, cr_loss=0.3892, over 12067.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1655, cr_loss=0.3766, over 3354768.47 frames. ], batch size: 124, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:49:01,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=211194.66666666666, ans=0.125 2024-09-23 07:49:09,348 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=22.5 2024-09-23 07:49:23,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.11 vs. limit=15.0 2024-09-23 07:49:29,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.93 vs. limit=15.0 2024-09-23 07:49:56,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=211334.66666666666, ans=0.125 2024-09-23 07:50:04,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=211381.33333333334, ans=0.2 2024-09-23 07:50:21,842 INFO [train.py:1198] (3/4) Epoch 12, batch 2450, loss[loss=0.2726, ctc_loss=0.1934, cr_loss=0.3958, over 17235.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1645, cr_loss=0.3755, over 3357447.78 frames. ], batch size: 50, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:50:35,435 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=22.5 2024-09-23 07:51:19,220 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.359e+02 1.540e+02 1.815e+02 2.529e+02, threshold=3.080e+02, percent-clipped=0.0 2024-09-23 07:51:21,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2024-09-23 07:51:24,325 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 07:51:41,365 INFO [train.py:1198] (3/4) Epoch 12, batch 2500, loss[loss=0.2104, ctc_loss=0.1391, cr_loss=0.3563, over 17244.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1646, cr_loss=0.3745, over 3348282.66 frames. ], batch size: 50, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:51:42,014 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=15.0 2024-09-23 07:52:02,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=211708.0, ans=0.1 2024-09-23 07:52:16,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=211754.66666666666, ans=0.1 2024-09-23 07:52:32,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=211801.33333333334, ans=0.125 2024-09-23 07:53:00,937 INFO [train.py:1198] (3/4) Epoch 12, batch 2550, loss[loss=0.2343, ctc_loss=0.1586, cr_loss=0.3784, over 17355.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1648, cr_loss=0.3751, over 3358505.76 frames. ], batch size: 48, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:53:13,117 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=12.0 2024-09-23 07:53:13,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=211894.66666666666, ans=0.0 2024-09-23 07:53:40,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=211988.0, ans=0.125 2024-09-23 07:53:53,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=212034.66666666666, ans=0.0 2024-09-23 07:53:55,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=212034.66666666666, ans=0.0 2024-09-23 07:53:58,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=212034.66666666666, ans=0.125 2024-09-23 07:54:00,930 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.326e+02 1.484e+02 1.777e+02 2.748e+02, threshold=2.968e+02, percent-clipped=0.0 2024-09-23 07:54:06,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=212034.66666666666, ans=0.2 2024-09-23 07:54:25,809 INFO [train.py:1198] (3/4) Epoch 12, batch 2600, loss[loss=0.2639, ctc_loss=0.1846, cr_loss=0.3966, over 17008.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1649, cr_loss=0.375, over 3353158.95 frames. ], batch size: 53, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:54:32,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=212128.0, ans=0.2 2024-09-23 07:54:50,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=212174.66666666666, ans=0.1 2024-09-23 07:55:38,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=212314.66666666666, ans=0.125 2024-09-23 07:55:41,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=212314.66666666666, ans=0.2 2024-09-23 07:55:47,929 INFO [train.py:1198] (3/4) Epoch 12, batch 2650, loss[loss=0.2371, ctc_loss=0.1634, cr_loss=0.3681, over 17014.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1642, cr_loss=0.375, over 3359903.03 frames. ], batch size: 52, lr: 9.99e-03, grad_scale: 32.0 2024-09-23 07:55:56,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=212361.33333333334, ans=0.2 2024-09-23 07:56:15,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=212408.0, ans=0.125 2024-09-23 07:56:33,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=212454.66666666666, ans=0.125 2024-09-23 07:56:45,680 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.240e+02 1.338e+02 1.470e+02 2.352e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-23 07:57:08,183 INFO [train.py:1198] (3/4) Epoch 12, batch 2700, loss[loss=0.2553, ctc_loss=0.1766, cr_loss=0.3936, over 16894.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1648, cr_loss=0.3757, over 3353780.46 frames. ], batch size: 58, lr: 9.99e-03, grad_scale: 32.0 2024-09-23 07:57:14,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=212594.66666666666, ans=0.025 2024-09-23 07:57:21,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=212594.66666666666, ans=0.05 2024-09-23 07:57:26,604 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=22.5 2024-09-23 07:57:37,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=212641.33333333334, ans=0.0 2024-09-23 07:57:50,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=212688.0, ans=0.09899494936611666 2024-09-23 07:57:50,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=212688.0, ans=0.2 2024-09-23 07:58:01,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=212734.66666666666, ans=0.0 2024-09-23 07:58:18,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=212781.33333333334, ans=0.125 2024-09-23 07:58:21,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2024-09-23 07:58:25,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=212781.33333333334, ans=22.5 2024-09-23 07:58:28,276 INFO [train.py:1198] (3/4) Epoch 12, batch 2750, loss[loss=0.2611, ctc_loss=0.1773, cr_loss=0.419, over 17055.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1646, cr_loss=0.3753, over 3351866.04 frames. ], batch size: 52, lr: 9.98e-03, grad_scale: 32.0 2024-09-23 07:58:40,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=212828.0, ans=0.2 2024-09-23 07:58:42,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=22.5 2024-09-23 07:58:50,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.37 vs. limit=10.0 2024-09-23 07:59:04,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=212921.33333333334, ans=0.125 2024-09-23 07:59:13,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=212921.33333333334, ans=0.125 2024-09-23 07:59:25,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=212968.0, ans=0.025 2024-09-23 07:59:33,713 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.344e+02 1.475e+02 1.726e+02 2.611e+02, threshold=2.950e+02, percent-clipped=0.0 2024-09-23 07:59:43,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=213014.66666666666, ans=0.025 2024-09-23 07:59:53,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.61 vs. limit=15.0 2024-09-23 07:59:58,920 INFO [train.py:1198] (3/4) Epoch 12, batch 2800, loss[loss=0.2563, ctc_loss=0.1774, cr_loss=0.3945, over 17015.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1652, cr_loss=0.3761, over 3346892.06 frames. ], batch size: 51, lr: 9.98e-03, grad_scale: 32.0 2024-09-23 08:00:13,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=213108.0, ans=0.025 2024-09-23 08:00:21,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=213108.0, ans=0.125 2024-09-23 08:00:56,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-09-23 08:01:09,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.46 vs. limit=15.0 2024-09-23 08:01:18,215 INFO [train.py:1198] (3/4) Epoch 12, batch 2850, loss[loss=0.2595, ctc_loss=0.1802, cr_loss=0.3963, over 17216.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1661, cr_loss=0.3773, over 3344347.89 frames. ], batch size: 47, lr: 9.97e-03, grad_scale: 32.0 2024-09-23 08:01:33,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=213341.33333333334, ans=0.125 2024-09-23 08:02:01,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=213388.0, ans=0.125 2024-09-23 08:02:15,759 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.286e+02 1.451e+02 1.716e+02 2.634e+02, threshold=2.902e+02, percent-clipped=0.0 2024-09-23 08:02:20,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=213481.33333333334, ans=0.125 2024-09-23 08:02:23,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.75 vs. limit=15.0 2024-09-23 08:02:28,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=213481.33333333334, ans=0.0 2024-09-23 08:02:38,158 INFO [train.py:1198] (3/4) Epoch 12, batch 2900, loss[loss=0.2029, ctc_loss=0.1366, cr_loss=0.3313, over 17127.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1656, cr_loss=0.3762, over 3344466.08 frames. ], batch size: 40, lr: 9.97e-03, grad_scale: 32.0 2024-09-23 08:02:54,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=213574.66666666666, ans=0.125 2024-09-23 08:03:00,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=213574.66666666666, ans=0.1 2024-09-23 08:03:26,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=213668.0, ans=0.0 2024-09-23 08:04:05,000 INFO [train.py:1198] (3/4) Epoch 12, batch 2950, loss[loss=0.2349, ctc_loss=0.1629, cr_loss=0.36, over 16803.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1665, cr_loss=0.3776, over 3345967.00 frames. ], batch size: 61, lr: 9.96e-03, grad_scale: 32.0 2024-09-23 08:04:15,801 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.99 vs. limit=15.0 2024-09-23 08:04:33,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=213808.0, ans=0.0 2024-09-23 08:05:04,866 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.287e+02 1.399e+02 1.568e+02 2.905e+02, threshold=2.798e+02, percent-clipped=1.0 2024-09-23 08:05:12,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=213948.0, ans=0.1 2024-09-23 08:05:26,776 INFO [train.py:1198] (3/4) Epoch 12, batch 3000, loss[loss=0.253, ctc_loss=0.1748, cr_loss=0.3913, over 17029.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1662, cr_loss=0.3782, over 3350758.91 frames. ], batch size: 44, lr: 9.96e-03, grad_scale: 32.0 2024-09-23 08:05:26,776 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 08:05:38,246 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.9230, 3.2275, 3.6067, 3.6248], device='cuda:3') 2024-09-23 08:05:42,571 INFO [train.py:1230] (3/4) Epoch 12, validation: loss=0.04588, ctc_loss=0.04588, cr_loss=7.526e-15, over 944034.00 frames. 2024-09-23 08:05:42,571 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 08:06:26,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214088.0, ans=0.1 2024-09-23 08:06:28,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=214134.66666666666, ans=0.07 2024-09-23 08:06:54,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=214181.33333333334, ans=0.0 2024-09-23 08:07:00,681 INFO [train.py:1198] (3/4) Epoch 12, batch 3050, loss[loss=0.2961, ctc_loss=0.2075, cr_loss=0.4429, over 16530.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.166, cr_loss=0.378, over 3347699.04 frames. ], batch size: 66, lr: 9.95e-03, grad_scale: 32.0 2024-09-23 08:07:08,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=214228.0, ans=0.0 2024-09-23 08:07:45,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=214368.0, ans=0.125 2024-09-23 08:07:56,198 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.305e+02 1.426e+02 1.616e+02 2.340e+02, threshold=2.852e+02, percent-clipped=0.0 2024-09-23 08:08:01,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=214414.66666666666, ans=0.0 2024-09-23 08:08:17,700 INFO [train.py:1198] (3/4) Epoch 12, batch 3100, loss[loss=0.2201, ctc_loss=0.1505, cr_loss=0.3477, over 17065.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1661, cr_loss=0.379, over 3355423.34 frames. ], batch size: 46, lr: 9.94e-03, grad_scale: 32.0 2024-09-23 08:08:46,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=214508.0, ans=0.125 2024-09-23 08:09:10,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=214601.33333333334, ans=0.125 2024-09-23 08:09:17,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=214601.33333333334, ans=0.125 2024-09-23 08:09:23,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=214648.0, ans=0.125 2024-09-23 08:09:25,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=214648.0, ans=0.025 2024-09-23 08:09:33,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214648.0, ans=0.1 2024-09-23 08:09:36,170 INFO [train.py:1198] (3/4) Epoch 12, batch 3150, loss[loss=0.1973, ctc_loss=0.1339, cr_loss=0.3166, over 17121.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1652, cr_loss=0.3769, over 3352957.18 frames. ], batch size: 40, lr: 9.94e-03, grad_scale: 32.0 2024-09-23 08:09:47,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=214694.66666666666, ans=15.0 2024-09-23 08:10:07,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=214788.0, ans=0.125 2024-09-23 08:10:09,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=214788.0, ans=0.0 2024-09-23 08:10:09,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=214788.0, ans=0.0 2024-09-23 08:10:28,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=22.5 2024-09-23 08:10:29,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=214834.66666666666, ans=0.0 2024-09-23 08:10:32,318 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.379e+02 1.486e+02 1.636e+02 2.338e+02, threshold=2.971e+02, percent-clipped=0.0 2024-09-23 08:10:41,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=214881.33333333334, ans=0.125 2024-09-23 08:10:45,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2024-09-23 08:10:54,109 INFO [train.py:1198] (3/4) Epoch 12, batch 3200, loss[loss=0.2411, ctc_loss=0.1667, cr_loss=0.3719, over 17210.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1657, cr_loss=0.3779, over 3361435.74 frames. ], batch size: 47, lr: 9.93e-03, grad_scale: 32.0 2024-09-23 08:10:54,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=214928.0, ans=0.0 2024-09-23 08:11:22,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=214974.66666666666, ans=0.125 2024-09-23 08:11:41,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=215068.0, ans=0.125 2024-09-23 08:12:04,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=215114.66666666666, ans=0.125 2024-09-23 08:12:16,044 INFO [train.py:1198] (3/4) Epoch 12, batch 3250, loss[loss=0.2722, ctc_loss=0.1922, cr_loss=0.4002, over 15451.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.165, cr_loss=0.3763, over 3351514.87 frames. ], batch size: 90, lr: 9.93e-03, grad_scale: 16.0 2024-09-23 08:12:23,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=215161.33333333334, ans=0.125 2024-09-23 08:12:47,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2024-09-23 08:13:13,441 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.295e+02 1.414e+02 1.544e+02 2.184e+02, threshold=2.828e+02, percent-clipped=0.0 2024-09-23 08:13:22,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=215348.0, ans=0.0 2024-09-23 08:13:35,945 INFO [train.py:1198] (3/4) Epoch 12, batch 3300, loss[loss=0.2738, ctc_loss=0.1826, cr_loss=0.4562, over 17066.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1648, cr_loss=0.3763, over 3343873.18 frames. ], batch size: 52, lr: 9.92e-03, grad_scale: 16.0 2024-09-23 08:13:50,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=215441.33333333334, ans=0.0 2024-09-23 08:13:51,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=215441.33333333334, ans=0.1 2024-09-23 08:13:53,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=215441.33333333334, ans=0.09899494936611666 2024-09-23 08:13:56,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=215441.33333333334, ans=0.1 2024-09-23 08:14:01,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=215441.33333333334, ans=0.2 2024-09-23 08:14:12,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=215488.0, ans=0.125 2024-09-23 08:14:12,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=215488.0, ans=0.125 2024-09-23 08:14:30,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=215534.66666666666, ans=0.0 2024-09-23 08:14:52,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=215581.33333333334, ans=10.0 2024-09-23 08:14:56,381 INFO [train.py:1198] (3/4) Epoch 12, batch 3350, loss[loss=0.2581, ctc_loss=0.1768, cr_loss=0.4066, over 17025.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1657, cr_loss=0.377, over 3325761.50 frames. ], batch size: 56, lr: 9.92e-03, grad_scale: 16.0 2024-09-23 08:15:00,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.23 vs. limit=22.5 2024-09-23 08:15:04,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=215628.0, ans=0.2 2024-09-23 08:15:09,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=215628.0, ans=0.95 2024-09-23 08:15:20,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=215674.66666666666, ans=0.0 2024-09-23 08:15:28,384 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2024-09-23 08:15:49,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=215768.0, ans=0.07 2024-09-23 08:15:51,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=215768.0, ans=0.09899494936611666 2024-09-23 08:15:54,078 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.291e+02 1.444e+02 1.665e+02 2.877e+02, threshold=2.888e+02, percent-clipped=1.0 2024-09-23 08:16:06,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=215814.66666666666, ans=0.04949747468305833 2024-09-23 08:16:09,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=215814.66666666666, ans=0.2 2024-09-23 08:16:14,341 INFO [train.py:1198] (3/4) Epoch 12, batch 3400, loss[loss=0.2552, ctc_loss=0.1778, cr_loss=0.3872, over 17346.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1654, cr_loss=0.3766, over 3331323.48 frames. ], batch size: 48, lr: 9.91e-03, grad_scale: 16.0 2024-09-23 08:16:27,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=215861.33333333334, ans=0.125 2024-09-23 08:16:29,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=215908.0, ans=12.0 2024-09-23 08:16:51,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=215954.66666666666, ans=0.035 2024-09-23 08:16:54,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=215954.66666666666, ans=0.125 2024-09-23 08:17:13,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=216001.33333333334, ans=0.1 2024-09-23 08:17:32,349 INFO [train.py:1198] (3/4) Epoch 12, batch 3450, loss[loss=0.208, ctc_loss=0.1393, cr_loss=0.3434, over 17031.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1643, cr_loss=0.3755, over 3343121.89 frames. ], batch size: 44, lr: 9.91e-03, grad_scale: 16.0 2024-09-23 08:17:53,303 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.32 vs. limit=22.5 2024-09-23 08:18:05,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=216188.0, ans=0.1 2024-09-23 08:18:19,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=216234.66666666666, ans=0.0 2024-09-23 08:18:30,278 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.280e+02 1.416e+02 1.630e+02 3.213e+02, threshold=2.832e+02, percent-clipped=1.0 2024-09-23 08:18:50,514 INFO [train.py:1198] (3/4) Epoch 12, batch 3500, loss[loss=0.2262, ctc_loss=0.1549, cr_loss=0.3565, over 17305.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1645, cr_loss=0.3754, over 3339877.85 frames. ], batch size: 49, lr: 9.90e-03, grad_scale: 16.0 2024-09-23 08:19:14,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=216374.66666666666, ans=0.1 2024-09-23 08:19:37,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=216468.0, ans=0.125 2024-09-23 08:19:57,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=216514.66666666666, ans=0.1 2024-09-23 08:20:08,733 INFO [train.py:1198] (3/4) Epoch 12, batch 3550, loss[loss=0.2351, ctc_loss=0.1624, cr_loss=0.3637, over 17225.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1649, cr_loss=0.3759, over 3339294.85 frames. ], batch size: 47, lr: 9.90e-03, grad_scale: 16.0 2024-09-23 08:20:16,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=216561.33333333334, ans=0.125 2024-09-23 08:20:26,450 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=22.5 2024-09-23 08:21:01,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=216701.33333333334, ans=0.025 2024-09-23 08:21:05,845 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.266e+02 1.416e+02 1.619e+02 2.341e+02, threshold=2.832e+02, percent-clipped=0.0 2024-09-23 08:21:21,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=216748.0, ans=0.125 2024-09-23 08:21:28,068 INFO [train.py:1198] (3/4) Epoch 12, batch 3600, loss[loss=0.2015, ctc_loss=0.1391, cr_loss=0.3117, over 16961.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1646, cr_loss=0.3756, over 3348667.03 frames. ], batch size: 42, lr: 9.89e-03, grad_scale: 32.0 2024-09-23 08:21:34,602 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 08:21:36,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=216794.66666666666, ans=0.0 2024-09-23 08:21:45,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=216841.33333333334, ans=0.125 2024-09-23 08:22:20,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=216934.66666666666, ans=0.125 2024-09-23 08:22:30,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.46 vs. limit=15.0 2024-09-23 08:22:48,416 INFO [train.py:1198] (3/4) Epoch 12, batch 3650, loss[loss=0.1955, ctc_loss=0.1264, cr_loss=0.3459, over 17112.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1643, cr_loss=0.3766, over 3358366.85 frames. ], batch size: 40, lr: 9.89e-03, grad_scale: 32.0 2024-09-23 08:22:59,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=217028.0, ans=0.125 2024-09-23 08:22:59,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=217028.0, ans=0.125 2024-09-23 08:23:50,115 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.266e+02 1.360e+02 1.445e+02 2.351e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-23 08:23:59,036 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 08:24:05,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=217214.66666666666, ans=0.025 2024-09-23 08:24:07,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=15.0 2024-09-23 08:24:09,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=217261.33333333334, ans=0.09899494936611666 2024-09-23 08:24:10,934 INFO [train.py:1198] (3/4) Epoch 12, batch 3700, loss[loss=0.1917, ctc_loss=0.1297, cr_loss=0.3101, over 16318.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1637, cr_loss=0.3758, over 3364032.62 frames. ], batch size: 36, lr: 9.88e-03, grad_scale: 32.0 2024-09-23 08:24:11,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=217261.33333333334, ans=0.125 2024-09-23 08:24:11,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=217261.33333333334, ans=0.0 2024-09-23 08:24:14,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=217261.33333333334, ans=0.125 2024-09-23 08:24:21,573 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.61 vs. limit=22.5 2024-09-23 08:25:29,361 INFO [train.py:1198] (3/4) Epoch 12, batch 3750, loss[loss=0.2277, ctc_loss=0.154, cr_loss=0.3681, over 17269.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.164, cr_loss=0.3758, over 3352312.84 frames. ], batch size: 44, lr: 9.88e-03, grad_scale: 32.0 2024-09-23 08:25:52,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=217541.33333333334, ans=0.2 2024-09-23 08:26:13,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=217588.0, ans=0.125 2024-09-23 08:26:17,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=217634.66666666666, ans=0.0 2024-09-23 08:26:26,722 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.365e+02 1.477e+02 1.661e+02 2.472e+02, threshold=2.954e+02, percent-clipped=0.0 2024-09-23 08:26:34,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=217681.33333333334, ans=0.125 2024-09-23 08:26:47,065 INFO [train.py:1198] (3/4) Epoch 12, batch 3800, loss[loss=0.3034, ctc_loss=0.2218, cr_loss=0.408, over 11815.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1654, cr_loss=0.3761, over 3316755.19 frames. ], batch size: 124, lr: 9.87e-03, grad_scale: 32.0 2024-09-23 08:27:07,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=217774.66666666666, ans=0.0 2024-09-23 08:28:05,902 INFO [train.py:1198] (3/4) Epoch 12, batch 3850, loss[loss=0.2073, ctc_loss=0.1394, cr_loss=0.3395, over 17193.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.168, cr_loss=0.3773, over 3266286.83 frames. ], batch size: 41, lr: 9.87e-03, grad_scale: 32.0 2024-09-23 08:28:23,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=218008.0, ans=0.125 2024-09-23 08:28:27,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=218008.0, ans=0.07 2024-09-23 08:28:32,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.97 vs. limit=6.0 2024-09-23 08:28:47,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=218054.66666666666, ans=0.125 2024-09-23 08:28:47,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=218054.66666666666, ans=0.2 2024-09-23 08:29:02,106 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.424e+02 1.630e+02 1.758e+02 2.384e+02, threshold=3.259e+02, percent-clipped=0.0 2024-09-23 08:29:09,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.61 vs. limit=22.5 2024-09-23 08:30:07,239 INFO [train.py:1198] (3/4) Epoch 13, batch 0, loss[loss=0.2535, ctc_loss=0.1732, cr_loss=0.4018, over 17345.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1732, cr_loss=0.4018, over 17345.00 frames. ], batch size: 48, lr: 9.48e-03, grad_scale: 32.0 2024-09-23 08:30:07,240 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 08:30:15,909 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.6360, 2.9691, 3.3151, 3.3113], device='cuda:3') 2024-09-23 08:30:22,743 INFO [train.py:1230] (3/4) Epoch 13, validation: loss=0.04407, ctc_loss=0.04407, cr_loss=7.62e-15, over 944034.00 frames. 2024-09-23 08:30:22,744 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 08:30:24,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=218176.0, ans=0.0 2024-09-23 08:30:26,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=218176.0, ans=0.0 2024-09-23 08:30:39,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=218222.66666666666, ans=0.0 2024-09-23 08:30:56,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=218269.33333333334, ans=0.0 2024-09-23 08:31:06,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=218269.33333333334, ans=0.0 2024-09-23 08:31:25,873 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2024-09-23 08:31:43,035 INFO [train.py:1198] (3/4) Epoch 13, batch 50, loss[loss=0.2479, ctc_loss=0.1676, cr_loss=0.4018, over 17282.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1654, cr_loss=0.3776, over 756743.55 frames. ], batch size: 51, lr: 9.47e-03, grad_scale: 32.0 2024-09-23 08:31:53,245 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 08:31:54,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218409.33333333334, ans=0.1 2024-09-23 08:31:56,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=218409.33333333334, ans=0.125 2024-09-23 08:32:00,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=218409.33333333334, ans=15.0 2024-09-23 08:32:42,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=218549.33333333334, ans=0.1 2024-09-23 08:32:47,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=218549.33333333334, ans=0.125 2024-09-23 08:32:47,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=218549.33333333334, ans=0.125 2024-09-23 08:32:51,666 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.296e+02 1.397e+02 1.549e+02 2.228e+02, threshold=2.794e+02, percent-clipped=0.0 2024-09-23 08:33:04,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=218596.0, ans=0.0 2024-09-23 08:33:08,628 INFO [train.py:1198] (3/4) Epoch 13, batch 100, loss[loss=0.2012, ctc_loss=0.1382, cr_loss=0.3151, over 16951.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1635, cr_loss=0.374, over 1333587.82 frames. ], batch size: 42, lr: 9.47e-03, grad_scale: 32.0 2024-09-23 08:33:08,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=218642.66666666666, ans=0.125 2024-09-23 08:33:09,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=218642.66666666666, ans=0.2 2024-09-23 08:33:23,226 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 08:33:24,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=218689.33333333334, ans=0.0 2024-09-23 08:33:41,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=218736.0, ans=0.2 2024-09-23 08:33:47,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=218736.0, ans=0.0 2024-09-23 08:33:47,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=218736.0, ans=0.125 2024-09-23 08:33:52,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=218736.0, ans=0.125 2024-09-23 08:33:57,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.22 vs. limit=10.0 2024-09-23 08:34:00,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=218782.66666666666, ans=0.07 2024-09-23 08:34:01,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=218782.66666666666, ans=0.125 2024-09-23 08:34:28,454 INFO [train.py:1198] (3/4) Epoch 13, batch 150, loss[loss=0.2265, ctc_loss=0.1551, cr_loss=0.3569, over 17157.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1625, cr_loss=0.373, over 1779849.02 frames. ], batch size: 45, lr: 9.46e-03, grad_scale: 32.0 2024-09-23 08:34:28,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=218876.0, ans=0.125 2024-09-23 08:34:38,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=218876.0, ans=0.125 2024-09-23 08:35:11,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=218969.33333333334, ans=0.125 2024-09-23 08:35:20,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=219016.0, ans=0.05 2024-09-23 08:35:28,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=219016.0, ans=0.5 2024-09-23 08:35:30,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=219016.0, ans=0.125 2024-09-23 08:35:32,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219016.0, ans=0.1 2024-09-23 08:35:36,675 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.277e+02 1.397e+02 1.518e+02 2.217e+02, threshold=2.795e+02, percent-clipped=0.0 2024-09-23 08:35:45,181 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2024-09-23 08:35:51,083 INFO [train.py:1198] (3/4) Epoch 13, batch 200, loss[loss=0.2456, ctc_loss=0.1675, cr_loss=0.3905, over 17039.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1626, cr_loss=0.3743, over 2131827.84 frames. ], batch size: 52, lr: 9.46e-03, grad_scale: 32.0 2024-09-23 08:36:06,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2024-09-23 08:36:31,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=219202.66666666666, ans=0.125 2024-09-23 08:36:33,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=219202.66666666666, ans=0.0 2024-09-23 08:36:38,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.82 vs. limit=22.5 2024-09-23 08:37:00,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=219296.0, ans=0.0 2024-09-23 08:37:05,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=219296.0, ans=0.0 2024-09-23 08:37:08,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=219296.0, ans=0.0 2024-09-23 08:37:09,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=219296.0, ans=0.0 2024-09-23 08:37:16,179 INFO [train.py:1198] (3/4) Epoch 13, batch 250, loss[loss=0.2177, ctc_loss=0.1524, cr_loss=0.3267, over 17206.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1631, cr_loss=0.3746, over 2399369.92 frames. ], batch size: 47, lr: 9.45e-03, grad_scale: 32.0 2024-09-23 08:37:21,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=219342.66666666666, ans=0.95 2024-09-23 08:37:23,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2024-09-23 08:37:29,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=219342.66666666666, ans=0.0 2024-09-23 08:37:46,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219436.0, ans=0.1 2024-09-23 08:38:24,174 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.243e+02 1.365e+02 1.573e+02 3.010e+02, threshold=2.729e+02, percent-clipped=2.0 2024-09-23 08:38:24,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=219529.33333333334, ans=0.125 2024-09-23 08:38:27,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=219529.33333333334, ans=0.05 2024-09-23 08:38:38,516 INFO [train.py:1198] (3/4) Epoch 13, batch 300, loss[loss=0.2588, ctc_loss=0.1825, cr_loss=0.3814, over 17210.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1631, cr_loss=0.3757, over 2615626.83 frames. ], batch size: 47, lr: 9.45e-03, grad_scale: 32.0 2024-09-23 08:39:17,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=219669.33333333334, ans=0.2 2024-09-23 08:39:23,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=219669.33333333334, ans=0.025 2024-09-23 08:39:48,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=219762.66666666666, ans=0.0 2024-09-23 08:39:50,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=219762.66666666666, ans=0.07 2024-09-23 08:39:58,342 INFO [train.py:1198] (3/4) Epoch 13, batch 350, loss[loss=0.2626, ctc_loss=0.1826, cr_loss=0.4001, over 16912.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1639, cr_loss=0.3762, over 2775188.51 frames. ], batch size: 58, lr: 9.44e-03, grad_scale: 32.0 2024-09-23 08:40:29,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=219856.0, ans=0.025 2024-09-23 08:40:31,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=219902.66666666666, ans=0.125 2024-09-23 08:40:31,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=219902.66666666666, ans=0.0 2024-09-23 08:40:48,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=219949.33333333334, ans=0.125 2024-09-23 08:40:48,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=219949.33333333334, ans=0.125 2024-09-23 08:41:06,039 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.335e+02 1.492e+02 1.717e+02 2.357e+02, threshold=2.983e+02, percent-clipped=0.0 2024-09-23 08:41:16,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=219996.0, ans=15.0 2024-09-23 08:41:20,272 INFO [train.py:1198] (3/4) Epoch 13, batch 400, loss[loss=0.2468, ctc_loss=0.1686, cr_loss=0.3911, over 17146.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1629, cr_loss=0.3755, over 2906339.21 frames. ], batch size: 48, lr: 9.44e-03, grad_scale: 32.0 2024-09-23 08:41:51,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=220089.33333333334, ans=0.125 2024-09-23 08:42:15,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=220182.66666666666, ans=0.125 2024-09-23 08:42:28,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=220229.33333333334, ans=10.0 2024-09-23 08:42:40,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=220229.33333333334, ans=0.0 2024-09-23 08:42:45,839 INFO [train.py:1198] (3/4) Epoch 13, batch 450, loss[loss=0.2397, ctc_loss=0.165, cr_loss=0.3737, over 17355.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1628, cr_loss=0.3758, over 3009324.11 frames. ], batch size: 48, lr: 9.43e-03, grad_scale: 32.0 2024-09-23 08:42:57,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=220276.0, ans=0.125 2024-09-23 08:43:06,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=220322.66666666666, ans=0.125 2024-09-23 08:43:06,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=220322.66666666666, ans=0.125 2024-09-23 08:43:53,978 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.263e+02 1.351e+02 1.525e+02 2.528e+02, threshold=2.701e+02, percent-clipped=0.0 2024-09-23 08:44:02,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=220462.66666666666, ans=0.2 2024-09-23 08:44:08,349 INFO [train.py:1198] (3/4) Epoch 13, batch 500, loss[loss=0.2172, ctc_loss=0.1461, cr_loss=0.3554, over 17048.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1625, cr_loss=0.375, over 3090292.09 frames. ], batch size: 39, lr: 9.43e-03, grad_scale: 32.0 2024-09-23 08:44:24,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=220556.0, ans=0.025 2024-09-23 08:44:37,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=220556.0, ans=0.05 2024-09-23 08:44:42,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=220602.66666666666, ans=0.2 2024-09-23 08:45:06,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=220649.33333333334, ans=0.125 2024-09-23 08:45:09,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=220649.33333333334, ans=0.125 2024-09-23 08:45:16,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2024-09-23 08:45:19,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=220696.0, ans=0.125 2024-09-23 08:45:31,238 INFO [train.py:1198] (3/4) Epoch 13, batch 550, loss[loss=0.2699, ctc_loss=0.1862, cr_loss=0.4187, over 16579.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1629, cr_loss=0.3756, over 3149352.90 frames. ], batch size: 66, lr: 9.42e-03, grad_scale: 32.0 2024-09-23 08:45:35,145 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.30 vs. limit=6.0 2024-09-23 08:45:39,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=220742.66666666666, ans=0.125 2024-09-23 08:45:44,904 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.12 vs. limit=15.0 2024-09-23 08:45:49,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.00 vs. limit=10.0 2024-09-23 08:46:15,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=220836.0, ans=0.125 2024-09-23 08:46:33,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=220929.33333333334, ans=0.125 2024-09-23 08:46:36,681 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.262e+02 1.359e+02 1.486e+02 2.281e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-23 08:46:45,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=220929.33333333334, ans=0.125 2024-09-23 08:46:56,574 INFO [train.py:1198] (3/4) Epoch 13, batch 600, loss[loss=0.2814, ctc_loss=0.1979, cr_loss=0.4174, over 17057.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1634, cr_loss=0.3771, over 3198742.68 frames. ], batch size: 52, lr: 9.42e-03, grad_scale: 32.0 2024-09-23 08:47:16,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=221022.66666666666, ans=6.0 2024-09-23 08:47:18,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=221022.66666666666, ans=0.0 2024-09-23 08:48:16,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=221162.66666666666, ans=10.0 2024-09-23 08:48:18,962 INFO [train.py:1198] (3/4) Epoch 13, batch 650, loss[loss=0.2145, ctc_loss=0.1438, cr_loss=0.3531, over 17071.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1636, cr_loss=0.3775, over 3232945.00 frames. ], batch size: 43, lr: 9.41e-03, grad_scale: 32.0 2024-09-23 08:48:47,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=221256.0, ans=0.07 2024-09-23 08:49:10,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=221349.33333333334, ans=0.0 2024-09-23 08:49:14,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=221349.33333333334, ans=0.125 2024-09-23 08:49:23,894 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.284e+02 1.397e+02 1.593e+02 2.300e+02, threshold=2.794e+02, percent-clipped=0.0 2024-09-23 08:49:38,327 INFO [train.py:1198] (3/4) Epoch 13, batch 700, loss[loss=0.2614, ctc_loss=0.1829, cr_loss=0.3922, over 16888.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1633, cr_loss=0.3767, over 3263145.54 frames. ], batch size: 58, lr: 9.41e-03, grad_scale: 32.0 2024-09-23 08:49:40,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=221442.66666666666, ans=0.2 2024-09-23 08:49:59,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=221489.33333333334, ans=0.1 2024-09-23 08:50:47,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=221629.33333333334, ans=0.025 2024-09-23 08:50:52,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=221629.33333333334, ans=0.125 2024-09-23 08:50:57,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=221629.33333333334, ans=0.125 2024-09-23 08:50:59,838 INFO [train.py:1198] (3/4) Epoch 13, batch 750, loss[loss=0.2572, ctc_loss=0.1793, cr_loss=0.3897, over 16911.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1639, cr_loss=0.3772, over 3295044.91 frames. ], batch size: 58, lr: 9.40e-03, grad_scale: 16.0 2024-09-23 08:51:14,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=221722.66666666666, ans=0.125 2024-09-23 08:51:14,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=221722.66666666666, ans=0.025 2024-09-23 08:51:30,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=221769.33333333334, ans=0.0 2024-09-23 08:51:57,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-23 08:52:11,979 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.258e+02 1.365e+02 1.480e+02 2.047e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-23 08:52:12,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=221862.66666666666, ans=0.0 2024-09-23 08:52:23,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=221909.33333333334, ans=0.0 2024-09-23 08:52:24,632 INFO [train.py:1198] (3/4) Epoch 13, batch 800, loss[loss=0.2393, ctc_loss=0.1637, cr_loss=0.3776, over 17294.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1633, cr_loss=0.3763, over 3315331.21 frames. ], batch size: 49, lr: 9.40e-03, grad_scale: 32.0 2024-09-23 08:52:42,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=221956.0, ans=0.125 2024-09-23 08:52:47,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=221956.0, ans=0.125 2024-09-23 08:53:07,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=222002.66666666666, ans=0.0 2024-09-23 08:53:12,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=222002.66666666666, ans=0.09899494936611666 2024-09-23 08:53:39,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=222096.0, ans=0.125 2024-09-23 08:53:42,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=222096.0, ans=0.125 2024-09-23 08:53:45,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=222142.66666666666, ans=0.125 2024-09-23 08:53:47,102 INFO [train.py:1198] (3/4) Epoch 13, batch 850, loss[loss=0.3099, ctc_loss=0.2141, cr_loss=0.4788, over 14913.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1632, cr_loss=0.3759, over 3322932.03 frames. ], batch size: 89, lr: 9.39e-03, grad_scale: 32.0 2024-09-23 08:53:47,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2024-09-23 08:53:54,627 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.60 vs. limit=8.0 2024-09-23 08:54:30,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=222236.0, ans=0.125 2024-09-23 08:54:35,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-09-23 08:54:38,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=222282.66666666666, ans=0.0 2024-09-23 08:54:48,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=222282.66666666666, ans=0.0 2024-09-23 08:54:51,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=222329.33333333334, ans=0.1 2024-09-23 08:54:54,223 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.309e+02 1.415e+02 1.608e+02 2.192e+02, threshold=2.830e+02, percent-clipped=0.0 2024-09-23 08:55:06,987 INFO [train.py:1198] (3/4) Epoch 13, batch 900, loss[loss=0.23, ctc_loss=0.1576, cr_loss=0.3623, over 16696.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1624, cr_loss=0.3743, over 3324408.13 frames. ], batch size: 61, lr: 9.39e-03, grad_scale: 32.0 2024-09-23 08:55:15,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=222376.0, ans=0.125 2024-09-23 08:55:15,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=222376.0, ans=0.0 2024-09-23 08:55:15,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.86 vs. limit=12.0 2024-09-23 08:55:32,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=222422.66666666666, ans=0.0 2024-09-23 08:55:48,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=222469.33333333334, ans=0.07 2024-09-23 08:56:32,015 INFO [train.py:1198] (3/4) Epoch 13, batch 950, loss[loss=0.2386, ctc_loss=0.1628, cr_loss=0.3792, over 16959.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1635, cr_loss=0.3759, over 3334717.51 frames. ], batch size: 42, lr: 9.38e-03, grad_scale: 32.0 2024-09-23 08:56:53,014 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 08:57:29,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=222749.33333333334, ans=0.2 2024-09-23 08:57:41,765 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.336e+02 1.446e+02 1.624e+02 2.624e+02, threshold=2.892e+02, percent-clipped=0.0 2024-09-23 08:57:48,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=222796.0, ans=0.2 2024-09-23 08:57:48,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=222796.0, ans=0.125 2024-09-23 08:57:54,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=222796.0, ans=0.125 2024-09-23 08:57:57,068 INFO [train.py:1198] (3/4) Epoch 13, batch 1000, loss[loss=0.2058, ctc_loss=0.1397, cr_loss=0.3306, over 16226.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1637, cr_loss=0.3759, over 3341503.01 frames. ], batch size: 36, lr: 9.38e-03, grad_scale: 32.0 2024-09-23 08:58:30,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2024-09-23 08:58:35,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-09-23 08:59:07,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=223029.33333333334, ans=0.125 2024-09-23 08:59:16,891 INFO [train.py:1198] (3/4) Epoch 13, batch 1050, loss[loss=0.2634, ctc_loss=0.1815, cr_loss=0.4096, over 17042.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1628, cr_loss=0.375, over 3347255.32 frames. ], batch size: 51, lr: 9.37e-03, grad_scale: 32.0 2024-09-23 08:59:25,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=223076.0, ans=0.125 2024-09-23 08:59:51,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=223169.33333333334, ans=0.0 2024-09-23 09:00:25,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=223262.66666666666, ans=0.1 2024-09-23 09:00:26,978 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.277e+02 1.397e+02 1.576e+02 3.836e+02, threshold=2.794e+02, percent-clipped=1.0 2024-09-23 09:00:39,437 INFO [train.py:1198] (3/4) Epoch 13, batch 1100, loss[loss=0.2253, ctc_loss=0.1532, cr_loss=0.3601, over 17337.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1625, cr_loss=0.3744, over 3346526.49 frames. ], batch size: 52, lr: 9.37e-03, grad_scale: 32.0 2024-09-23 09:00:54,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2024-09-23 09:00:55,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=223356.0, ans=0.07 2024-09-23 09:01:00,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=22.5 2024-09-23 09:01:20,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=223402.66666666666, ans=0.125 2024-09-23 09:02:01,298 INFO [train.py:1198] (3/4) Epoch 13, batch 1150, loss[loss=0.2063, ctc_loss=0.1364, cr_loss=0.3495, over 17090.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1621, cr_loss=0.3735, over 3344045.44 frames. ], batch size: 43, lr: 9.37e-03, grad_scale: 32.0 2024-09-23 09:02:04,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=223542.66666666666, ans=0.0 2024-09-23 09:02:14,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=223542.66666666666, ans=0.125 2024-09-23 09:03:11,339 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.252e+02 1.365e+02 1.487e+02 2.591e+02, threshold=2.731e+02, percent-clipped=0.0 2024-09-23 09:03:23,908 INFO [train.py:1198] (3/4) Epoch 13, batch 1200, loss[loss=0.2122, ctc_loss=0.1443, cr_loss=0.3397, over 17031.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1615, cr_loss=0.3727, over 3343016.02 frames. ], batch size: 39, lr: 9.36e-03, grad_scale: 32.0 2024-09-23 09:03:27,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=223776.0, ans=0.0 2024-09-23 09:03:46,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=223822.66666666666, ans=0.125 2024-09-23 09:03:59,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=223869.33333333334, ans=0.05 2024-09-23 09:04:07,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=223869.33333333334, ans=0.125 2024-09-23 09:04:24,027 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.93 vs. limit=10.0 2024-09-23 09:04:46,171 INFO [train.py:1198] (3/4) Epoch 13, batch 1250, loss[loss=0.2328, ctc_loss=0.1598, cr_loss=0.3651, over 17178.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1622, cr_loss=0.374, over 3343183.34 frames. ], batch size: 41, lr: 9.36e-03, grad_scale: 32.0 2024-09-23 09:04:51,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=224009.33333333334, ans=0.025 2024-09-23 09:05:23,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=224102.66666666666, ans=0.125 2024-09-23 09:05:55,485 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.315e+02 1.381e+02 1.503e+02 2.464e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-23 09:06:06,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=224242.66666666666, ans=0.125 2024-09-23 09:06:08,066 INFO [train.py:1198] (3/4) Epoch 13, batch 1300, loss[loss=0.2009, ctc_loss=0.1337, cr_loss=0.336, over 16964.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1632, cr_loss=0.3762, over 3341332.28 frames. ], batch size: 42, lr: 9.35e-03, grad_scale: 32.0 2024-09-23 09:07:28,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=224429.33333333334, ans=0.0 2024-09-23 09:07:32,576 INFO [train.py:1198] (3/4) Epoch 13, batch 1350, loss[loss=0.2777, ctc_loss=0.1931, cr_loss=0.4232, over 17040.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1631, cr_loss=0.3763, over 3348348.46 frames. ], batch size: 56, lr: 9.35e-03, grad_scale: 32.0 2024-09-23 09:08:41,709 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.260e+02 1.380e+02 1.508e+02 2.232e+02, threshold=2.759e+02, percent-clipped=0.0 2024-09-23 09:08:41,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=224662.66666666666, ans=0.125 2024-09-23 09:08:54,699 INFO [train.py:1198] (3/4) Epoch 13, batch 1400, loss[loss=0.2564, ctc_loss=0.1754, cr_loss=0.4052, over 17292.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1621, cr_loss=0.3758, over 3355818.42 frames. ], batch size: 51, lr: 9.34e-03, grad_scale: 32.0 2024-09-23 09:08:58,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=224709.33333333334, ans=0.125 2024-09-23 09:09:15,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=224756.0, ans=0.125 2024-09-23 09:09:17,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2024-09-23 09:09:36,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=224802.66666666666, ans=0.1 2024-09-23 09:09:39,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=224802.66666666666, ans=0.0 2024-09-23 09:10:09,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=224896.0, ans=0.125 2024-09-23 09:10:17,266 INFO [train.py:1198] (3/4) Epoch 13, batch 1450, loss[loss=0.2622, ctc_loss=0.1807, cr_loss=0.4075, over 16080.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1623, cr_loss=0.3755, over 3341731.78 frames. ], batch size: 74, lr: 9.34e-03, grad_scale: 32.0 2024-09-23 09:10:35,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=224989.33333333334, ans=0.125 2024-09-23 09:10:43,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=224989.33333333334, ans=0.125 2024-09-23 09:10:46,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=224989.33333333334, ans=0.0 2024-09-23 09:10:56,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225036.0, ans=0.1 2024-09-23 09:11:12,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2024-09-23 09:11:29,649 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.289e+02 1.416e+02 1.573e+02 2.089e+02, threshold=2.832e+02, percent-clipped=0.0 2024-09-23 09:11:42,313 INFO [train.py:1198] (3/4) Epoch 13, batch 1500, loss[loss=0.2455, ctc_loss=0.1686, cr_loss=0.3845, over 17225.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1618, cr_loss=0.3745, over 3354331.20 frames. ], batch size: 55, lr: 9.33e-03, grad_scale: 32.0 2024-09-23 09:11:45,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=225176.0, ans=0.025 2024-09-23 09:12:09,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=225222.66666666666, ans=0.2 2024-09-23 09:12:20,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-09-23 09:12:34,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=225316.0, ans=0.125 2024-09-23 09:12:37,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2024-09-23 09:12:47,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=225362.66666666666, ans=0.0 2024-09-23 09:13:05,143 INFO [train.py:1198] (3/4) Epoch 13, batch 1550, loss[loss=0.2365, ctc_loss=0.1653, cr_loss=0.3555, over 15083.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1617, cr_loss=0.3735, over 3361855.87 frames. ], batch size: 89, lr: 9.33e-03, grad_scale: 32.0 2024-09-23 09:13:28,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=225456.0, ans=0.125 2024-09-23 09:13:29,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=225456.0, ans=0.0 2024-09-23 09:13:32,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=225456.0, ans=0.2 2024-09-23 09:13:37,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=225502.66666666666, ans=0.0 2024-09-23 09:13:44,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=225502.66666666666, ans=0.125 2024-09-23 09:14:02,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2024-09-23 09:14:12,645 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.248e+02 1.360e+02 1.586e+02 2.535e+02, threshold=2.720e+02, percent-clipped=0.0 2024-09-23 09:14:25,422 INFO [train.py:1198] (3/4) Epoch 13, batch 1600, loss[loss=0.1999, ctc_loss=0.1339, cr_loss=0.3304, over 17043.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1608, cr_loss=0.3729, over 3368010.19 frames. ], batch size: 39, lr: 9.32e-03, grad_scale: 32.0 2024-09-23 09:14:51,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2024-09-23 09:15:14,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=225782.66666666666, ans=0.0 2024-09-23 09:15:24,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=225782.66666666666, ans=0.0 2024-09-23 09:15:37,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225829.33333333334, ans=0.1 2024-09-23 09:15:40,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=225829.33333333334, ans=0.0 2024-09-23 09:15:48,385 INFO [train.py:1198] (3/4) Epoch 13, batch 1650, loss[loss=0.2552, ctc_loss=0.1775, cr_loss=0.3883, over 16604.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1604, cr_loss=0.3725, over 3365160.04 frames. ], batch size: 66, lr: 9.32e-03, grad_scale: 32.0 2024-09-23 09:16:01,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=225876.0, ans=0.125 2024-09-23 09:16:11,592 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2024-09-23 09:16:33,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=225969.33333333334, ans=0.125 2024-09-23 09:16:43,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.75 vs. limit=15.0 2024-09-23 09:16:56,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=226062.66666666666, ans=0.0 2024-09-23 09:16:59,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=226062.66666666666, ans=0.1 2024-09-23 09:17:00,547 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.262e+02 1.363e+02 1.503e+02 2.167e+02, threshold=2.727e+02, percent-clipped=0.0 2024-09-23 09:17:01,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2024-09-23 09:17:13,266 INFO [train.py:1198] (3/4) Epoch 13, batch 1700, loss[loss=0.2048, ctc_loss=0.1359, cr_loss=0.3444, over 17268.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1607, cr_loss=0.3726, over 3360524.52 frames. ], batch size: 42, lr: 9.31e-03, grad_scale: 32.0 2024-09-23 09:17:21,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=226109.33333333334, ans=0.125 2024-09-23 09:17:21,677 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 09:17:24,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=226109.33333333334, ans=0.0 2024-09-23 09:17:56,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2024-09-23 09:18:07,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=226249.33333333334, ans=0.125 2024-09-23 09:18:11,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.13 vs. limit=22.5 2024-09-23 09:18:20,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=226296.0, ans=0.0 2024-09-23 09:18:35,888 INFO [train.py:1198] (3/4) Epoch 13, batch 1750, loss[loss=0.254, ctc_loss=0.1762, cr_loss=0.3887, over 16926.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1609, cr_loss=0.3723, over 3360418.62 frames. ], batch size: 58, lr: 9.31e-03, grad_scale: 32.0 2024-09-23 09:18:53,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.67 vs. limit=6.0 2024-09-23 09:18:58,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=226389.33333333334, ans=0.125 2024-09-23 09:19:42,591 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.255e+02 1.356e+02 1.518e+02 2.113e+02, threshold=2.712e+02, percent-clipped=0.0 2024-09-23 09:19:55,548 INFO [train.py:1198] (3/4) Epoch 13, batch 1800, loss[loss=0.1982, ctc_loss=0.131, cr_loss=0.3361, over 16708.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1601, cr_loss=0.3714, over 3360763.44 frames. ], batch size: 37, lr: 9.30e-03, grad_scale: 32.0 2024-09-23 09:20:02,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=226576.0, ans=0.125 2024-09-23 09:20:03,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=226576.0, ans=0.0 2024-09-23 09:20:07,419 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-09-23 09:20:22,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=226622.66666666666, ans=0.125 2024-09-23 09:20:22,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=226622.66666666666, ans=0.025 2024-09-23 09:20:50,100 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.94 vs. limit=12.0 2024-09-23 09:20:55,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=226716.0, ans=0.05 2024-09-23 09:20:57,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=226716.0, ans=0.125 2024-09-23 09:21:22,739 INFO [train.py:1198] (3/4) Epoch 13, batch 1850, loss[loss=0.2694, ctc_loss=0.1886, cr_loss=0.404, over 16016.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1613, cr_loss=0.3722, over 3345823.23 frames. ], batch size: 74, lr: 9.30e-03, grad_scale: 32.0 2024-09-23 09:21:23,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=226809.33333333334, ans=0.07 2024-09-23 09:21:29,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=226809.33333333334, ans=0.125 2024-09-23 09:21:33,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.32 vs. limit=15.0 2024-09-23 09:21:39,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=226856.0, ans=0.2 2024-09-23 09:21:55,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.27 vs. limit=15.0 2024-09-23 09:22:13,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=12.0 2024-09-23 09:22:15,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=226949.33333333334, ans=0.125 2024-09-23 09:22:25,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=226996.0, ans=0.05 2024-09-23 09:22:29,434 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.038e+02 1.309e+02 1.425e+02 1.783e+02 3.519e+02, threshold=2.851e+02, percent-clipped=1.0 2024-09-23 09:22:30,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=15.0 2024-09-23 09:22:41,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=226996.0, ans=0.0 2024-09-23 09:22:44,630 INFO [train.py:1198] (3/4) Epoch 13, batch 1900, loss[loss=0.2455, ctc_loss=0.1713, cr_loss=0.3711, over 16696.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.162, cr_loss=0.3737, over 3346510.84 frames. ], batch size: 61, lr: 9.29e-03, grad_scale: 32.0 2024-09-23 09:23:08,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2024-09-23 09:23:09,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=227089.33333333334, ans=0.125 2024-09-23 09:23:16,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2024-09-23 09:23:22,440 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2024-09-23 09:23:33,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2024-09-23 09:24:04,049 INFO [train.py:1198] (3/4) Epoch 13, batch 1950, loss[loss=0.2415, ctc_loss=0.1649, cr_loss=0.3828, over 17028.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1612, cr_loss=0.3728, over 3349929.36 frames. ], batch size: 56, lr: 9.29e-03, grad_scale: 32.0 2024-09-23 09:24:41,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.07 vs. limit=10.0 2024-09-23 09:25:09,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=227462.66666666666, ans=0.125 2024-09-23 09:25:13,356 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.325e+02 1.427e+02 1.553e+02 2.368e+02, threshold=2.853e+02, percent-clipped=0.0 2024-09-23 09:25:25,899 INFO [train.py:1198] (3/4) Epoch 13, batch 2000, loss[loss=0.3296, ctc_loss=0.2391, cr_loss=0.4526, over 12222.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1617, cr_loss=0.3745, over 3353642.37 frames. ], batch size: 123, lr: 9.29e-03, grad_scale: 32.0 2024-09-23 09:25:40,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=227556.0, ans=0.0 2024-09-23 09:25:49,338 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.90 vs. limit=15.0 2024-09-23 09:26:17,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=227649.33333333334, ans=0.125 2024-09-23 09:26:22,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=227649.33333333334, ans=0.125 2024-09-23 09:26:34,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=227696.0, ans=0.125 2024-09-23 09:26:41,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=227696.0, ans=0.05 2024-09-23 09:26:49,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=227742.66666666666, ans=0.125 2024-09-23 09:26:51,001 INFO [train.py:1198] (3/4) Epoch 13, batch 2050, loss[loss=0.2452, ctc_loss=0.1695, cr_loss=0.3787, over 17073.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1626, cr_loss=0.3757, over 3352800.76 frames. ], batch size: 46, lr: 9.28e-03, grad_scale: 16.0 2024-09-23 09:26:55,117 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.36 vs. limit=15.0 2024-09-23 09:27:04,418 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-09-23 09:27:07,177 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 09:28:00,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=227929.33333333334, ans=0.125 2024-09-23 09:28:01,913 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.246e+02 1.339e+02 1.445e+02 2.585e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-23 09:28:13,097 INFO [train.py:1198] (3/4) Epoch 13, batch 2100, loss[loss=0.1947, ctc_loss=0.1302, cr_loss=0.3225, over 17261.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1611, cr_loss=0.3741, over 3364215.27 frames. ], batch size: 42, lr: 9.28e-03, grad_scale: 16.0 2024-09-23 09:28:26,283 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 09:28:39,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=228022.66666666666, ans=0.1 2024-09-23 09:28:42,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=228022.66666666666, ans=0.2 2024-09-23 09:28:42,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.21 vs. limit=15.0 2024-09-23 09:29:01,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=228116.0, ans=0.04949747468305833 2024-09-23 09:29:02,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=228116.0, ans=0.125 2024-09-23 09:29:07,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=228116.0, ans=0.125 2024-09-23 09:29:13,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=228116.0, ans=0.2 2024-09-23 09:29:32,536 INFO [train.py:1198] (3/4) Epoch 13, batch 2150, loss[loss=0.2917, ctc_loss=0.204, cr_loss=0.4383, over 15140.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.161, cr_loss=0.3737, over 3365500.48 frames. ], batch size: 88, lr: 9.27e-03, grad_scale: 16.0 2024-09-23 09:29:45,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=228209.33333333334, ans=0.0 2024-09-23 09:30:00,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=228256.0, ans=0.125 2024-09-23 09:30:07,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=228302.66666666666, ans=0.1 2024-09-23 09:30:38,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-09-23 09:30:40,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=228396.0, ans=0.2 2024-09-23 09:30:43,654 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.254e+02 1.312e+02 1.458e+02 2.181e+02, threshold=2.624e+02, percent-clipped=0.0 2024-09-23 09:30:54,887 INFO [train.py:1198] (3/4) Epoch 13, batch 2200, loss[loss=0.2254, ctc_loss=0.1522, cr_loss=0.366, over 17164.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1603, cr_loss=0.3728, over 3375657.10 frames. ], batch size: 45, lr: 9.27e-03, grad_scale: 16.0 2024-09-23 09:31:11,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=228489.33333333334, ans=0.2 2024-09-23 09:31:18,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.62 vs. limit=15.0 2024-09-23 09:31:30,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=228536.0, ans=0.05 2024-09-23 09:31:58,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=228582.66666666666, ans=0.2 2024-09-23 09:32:19,401 INFO [train.py:1198] (3/4) Epoch 13, batch 2250, loss[loss=0.1914, ctc_loss=0.1293, cr_loss=0.3104, over 16298.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1607, cr_loss=0.3736, over 3378540.11 frames. ], batch size: 36, lr: 9.26e-03, grad_scale: 16.0 2024-09-23 09:32:49,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.84 vs. limit=22.5 2024-09-23 09:33:17,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=228816.0, ans=0.125 2024-09-23 09:33:24,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=228862.66666666666, ans=0.125 2024-09-23 09:33:30,214 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.274e+02 1.381e+02 1.487e+02 1.972e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-23 09:33:35,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=228862.66666666666, ans=0.125 2024-09-23 09:33:36,318 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.43 vs. limit=15.0 2024-09-23 09:33:41,452 INFO [train.py:1198] (3/4) Epoch 13, batch 2300, loss[loss=0.2757, ctc_loss=0.196, cr_loss=0.3987, over 15079.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1611, cr_loss=0.3743, over 3371635.19 frames. ], batch size: 89, lr: 9.26e-03, grad_scale: 16.0 2024-09-23 09:33:51,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=228909.33333333334, ans=0.1 2024-09-23 09:33:52,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=228909.33333333334, ans=0.0 2024-09-23 09:33:52,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=228909.33333333334, ans=0.0 2024-09-23 09:34:01,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=228956.0, ans=0.2 2024-09-23 09:34:17,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=229002.66666666666, ans=0.0 2024-09-23 09:34:26,873 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 09:34:37,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=229049.33333333334, ans=0.0 2024-09-23 09:34:42,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=229049.33333333334, ans=0.125 2024-09-23 09:34:57,402 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 09:34:59,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=15.0 2024-09-23 09:34:59,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.08 vs. limit=10.0 2024-09-23 09:35:01,962 INFO [train.py:1198] (3/4) Epoch 13, batch 2350, loss[loss=0.2362, ctc_loss=0.162, cr_loss=0.3707, over 17175.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1614, cr_loss=0.3745, over 3367873.25 frames. ], batch size: 45, lr: 9.25e-03, grad_scale: 16.0 2024-09-23 09:35:05,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=229142.66666666666, ans=0.09899494936611666 2024-09-23 09:35:16,786 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=22.5 2024-09-23 09:35:18,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.22 vs. limit=15.0 2024-09-23 09:35:36,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=229236.0, ans=0.05 2024-09-23 09:35:55,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=229282.66666666666, ans=0.0 2024-09-23 09:36:15,725 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.316e+02 1.425e+02 1.605e+02 2.479e+02, threshold=2.851e+02, percent-clipped=0.0 2024-09-23 09:36:17,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=229329.33333333334, ans=0.125 2024-09-23 09:36:27,012 INFO [train.py:1198] (3/4) Epoch 13, batch 2400, loss[loss=0.2556, ctc_loss=0.1765, cr_loss=0.3955, over 17076.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1621, cr_loss=0.375, over 3358371.72 frames. ], batch size: 46, lr: 9.25e-03, grad_scale: 32.0 2024-09-23 09:36:29,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.05 vs. limit=10.0 2024-09-23 09:36:57,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=229469.33333333334, ans=0.125 2024-09-23 09:37:02,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=229469.33333333334, ans=0.025 2024-09-23 09:37:10,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=12.0 2024-09-23 09:37:13,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=229516.0, ans=0.125 2024-09-23 09:37:18,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=229516.0, ans=0.1 2024-09-23 09:37:31,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229562.66666666666, ans=0.1 2024-09-23 09:37:31,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229562.66666666666, ans=0.1 2024-09-23 09:37:37,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=22.5 2024-09-23 09:37:49,400 INFO [train.py:1198] (3/4) Epoch 13, batch 2450, loss[loss=0.2738, ctc_loss=0.1919, cr_loss=0.4095, over 17030.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1623, cr_loss=0.3747, over 3349422.24 frames. ], batch size: 53, lr: 9.24e-03, grad_scale: 32.0 2024-09-23 09:38:12,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=229656.0, ans=0.125 2024-09-23 09:38:31,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=229702.66666666666, ans=0.04949747468305833 2024-09-23 09:38:58,205 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.292e+02 1.402e+02 1.572e+02 2.224e+02, threshold=2.803e+02, percent-clipped=0.0 2024-09-23 09:39:09,496 INFO [train.py:1198] (3/4) Epoch 13, batch 2500, loss[loss=0.2523, ctc_loss=0.1719, cr_loss=0.4018, over 16979.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1624, cr_loss=0.3754, over 3354482.29 frames. ], batch size: 56, lr: 9.24e-03, grad_scale: 32.0 2024-09-23 09:39:39,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=229889.33333333334, ans=0.125 2024-09-23 09:39:52,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=22.5 2024-09-23 09:40:11,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=229982.66666666666, ans=0.0 2024-09-23 09:40:29,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=230029.33333333334, ans=0.125 2024-09-23 09:40:32,034 INFO [train.py:1198] (3/4) Epoch 13, batch 2550, loss[loss=0.2861, ctc_loss=0.2, cr_loss=0.4306, over 17046.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1624, cr_loss=0.3755, over 3352299.63 frames. ], batch size: 53, lr: 9.23e-03, grad_scale: 32.0 2024-09-23 09:40:53,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=230122.66666666666, ans=0.125 2024-09-23 09:41:16,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=230169.33333333334, ans=0.05 2024-09-23 09:41:33,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=230216.0, ans=0.5 2024-09-23 09:41:38,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=230216.0, ans=0.125 2024-09-23 09:41:38,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=230216.0, ans=0.125 2024-09-23 09:41:39,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=230262.66666666666, ans=0.125 2024-09-23 09:41:45,894 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.282e+02 1.430e+02 1.589e+02 2.134e+02, threshold=2.861e+02, percent-clipped=0.0 2024-09-23 09:41:47,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=230262.66666666666, ans=0.125 2024-09-23 09:41:57,177 INFO [train.py:1198] (3/4) Epoch 13, batch 2600, loss[loss=0.2375, ctc_loss=0.1594, cr_loss=0.3905, over 17229.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1611, cr_loss=0.3732, over 3351471.17 frames. ], batch size: 47, lr: 9.23e-03, grad_scale: 32.0 2024-09-23 09:42:03,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=230309.33333333334, ans=0.1 2024-09-23 09:42:35,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=230402.66666666666, ans=0.125 2024-09-23 09:42:54,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=230449.33333333334, ans=0.1 2024-09-23 09:43:08,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=230496.0, ans=0.0 2024-09-23 09:43:15,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=230496.0, ans=0.125 2024-09-23 09:43:20,134 INFO [train.py:1198] (3/4) Epoch 13, batch 2650, loss[loss=0.2022, ctc_loss=0.1351, cr_loss=0.3357, over 17231.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.16, cr_loss=0.3708, over 3355012.30 frames. ], batch size: 50, lr: 9.23e-03, grad_scale: 32.0 2024-09-23 09:43:52,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=230636.0, ans=0.2 2024-09-23 09:43:55,054 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.26 vs. limit=15.0 2024-09-23 09:44:16,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=230682.66666666666, ans=0.125 2024-09-23 09:44:28,570 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.270e+02 1.366e+02 1.498e+02 2.280e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-23 09:44:28,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=230729.33333333334, ans=0.125 2024-09-23 09:44:39,787 INFO [train.py:1198] (3/4) Epoch 13, batch 2700, loss[loss=0.2243, ctc_loss=0.1537, cr_loss=0.3529, over 17098.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1599, cr_loss=0.3711, over 3356729.17 frames. ], batch size: 43, lr: 9.22e-03, grad_scale: 32.0 2024-09-23 09:44:43,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=230776.0, ans=0.125 2024-09-23 09:44:46,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.56 vs. limit=12.0 2024-09-23 09:44:59,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=230822.66666666666, ans=0.0 2024-09-23 09:44:59,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=230822.66666666666, ans=0.1 2024-09-23 09:45:57,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=230962.66666666666, ans=0.0 2024-09-23 09:46:02,468 INFO [train.py:1198] (3/4) Epoch 13, batch 2750, loss[loss=0.2545, ctc_loss=0.1766, cr_loss=0.3897, over 17019.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1598, cr_loss=0.3706, over 3363317.66 frames. ], batch size: 53, lr: 9.22e-03, grad_scale: 32.0 2024-09-23 09:46:11,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=231009.33333333334, ans=0.125 2024-09-23 09:46:24,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=231056.0, ans=0.2 2024-09-23 09:46:29,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=22.5 2024-09-23 09:47:19,274 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.267e+02 1.386e+02 1.565e+02 1.913e+02, threshold=2.771e+02, percent-clipped=0.0 2024-09-23 09:47:21,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=231196.0, ans=0.05 2024-09-23 09:47:30,587 INFO [train.py:1198] (3/4) Epoch 13, batch 2800, loss[loss=0.2161, ctc_loss=0.1453, cr_loss=0.3539, over 17113.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1593, cr_loss=0.3701, over 3365609.56 frames. ], batch size: 40, lr: 9.21e-03, grad_scale: 32.0 2024-09-23 09:47:36,367 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.54 vs. limit=15.0 2024-09-23 09:47:45,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=231289.33333333334, ans=0.1 2024-09-23 09:48:09,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=15.0 2024-09-23 09:48:16,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=231382.66666666666, ans=0.0 2024-09-23 09:48:22,044 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2024-09-23 09:48:35,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=231429.33333333334, ans=0.0 2024-09-23 09:48:50,269 INFO [train.py:1198] (3/4) Epoch 13, batch 2850, loss[loss=0.2687, ctc_loss=0.1858, cr_loss=0.4141, over 16597.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1601, cr_loss=0.3716, over 3360787.47 frames. ], batch size: 66, lr: 9.21e-03, grad_scale: 16.0 2024-09-23 09:49:40,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=231616.0, ans=0.0 2024-09-23 09:49:49,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=231616.0, ans=0.0 2024-09-23 09:49:52,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=231662.66666666666, ans=0.0 2024-09-23 09:50:00,334 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.288e+02 1.395e+02 1.565e+02 5.212e+02, threshold=2.790e+02, percent-clipped=1.0 2024-09-23 09:50:12,524 INFO [train.py:1198] (3/4) Epoch 13, batch 2900, loss[loss=0.2365, ctc_loss=0.1614, cr_loss=0.3754, over 17034.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1615, cr_loss=0.3733, over 3362629.09 frames. ], batch size: 52, lr: 9.20e-03, grad_scale: 16.0 2024-09-23 09:50:24,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=231709.33333333334, ans=0.125 2024-09-23 09:50:32,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=231756.0, ans=0.125 2024-09-23 09:50:41,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231756.0, ans=0.1 2024-09-23 09:50:45,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=12.0 2024-09-23 09:51:24,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2024-09-23 09:51:35,582 INFO [train.py:1198] (3/4) Epoch 13, batch 2950, loss[loss=0.2995, ctc_loss=0.2153, cr_loss=0.4207, over 15068.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1601, cr_loss=0.372, over 3368343.74 frames. ], batch size: 89, lr: 9.20e-03, grad_scale: 16.0 2024-09-23 09:51:40,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=231942.66666666666, ans=0.025 2024-09-23 09:51:44,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.18 vs. limit=22.5 2024-09-23 09:52:47,838 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.241e+02 1.342e+02 1.458e+02 2.416e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-23 09:52:57,273 INFO [train.py:1198] (3/4) Epoch 13, batch 3000, loss[loss=0.1882, ctc_loss=0.1261, cr_loss=0.3105, over 16315.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1606, cr_loss=0.3728, over 3365462.81 frames. ], batch size: 36, lr: 9.19e-03, grad_scale: 16.0 2024-09-23 09:52:57,274 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 09:53:12,981 INFO [train.py:1230] (3/4) Epoch 13, validation: loss=0.04424, ctc_loss=0.04424, cr_loss=7.269e-15, over 944034.00 frames. 2024-09-23 09:53:12,982 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 09:54:03,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=232316.0, ans=0.125 2024-09-23 09:54:22,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2024-09-23 09:54:31,495 INFO [train.py:1198] (3/4) Epoch 13, batch 3050, loss[loss=0.2565, ctc_loss=0.1733, cr_loss=0.4155, over 17101.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1605, cr_loss=0.3727, over 3361120.79 frames. ], batch size: 49, lr: 9.19e-03, grad_scale: 16.0 2024-09-23 09:54:33,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=232409.33333333334, ans=0.1 2024-09-23 09:55:39,979 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.372e+02 1.503e+02 1.600e+02 2.658e+02, threshold=3.005e+02, percent-clipped=0.0 2024-09-23 09:55:44,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=232596.0, ans=0.0 2024-09-23 09:55:49,567 INFO [train.py:1198] (3/4) Epoch 13, batch 3100, loss[loss=0.1947, ctc_loss=0.1282, cr_loss=0.3325, over 17041.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1608, cr_loss=0.3727, over 3355022.31 frames. ], batch size: 39, lr: 9.18e-03, grad_scale: 16.0 2024-09-23 09:55:49,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=232642.66666666666, ans=0.1 2024-09-23 09:55:57,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=232642.66666666666, ans=0.0 2024-09-23 09:56:12,550 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2024-09-23 09:56:32,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=232736.0, ans=0.2 2024-09-23 09:57:01,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=232829.33333333334, ans=0.1 2024-09-23 09:57:09,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-09-23 09:57:10,793 INFO [train.py:1198] (3/4) Epoch 13, batch 3150, loss[loss=0.2598, ctc_loss=0.1895, cr_loss=0.3515, over 11926.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.161, cr_loss=0.373, over 3348764.60 frames. ], batch size: 123, lr: 9.18e-03, grad_scale: 16.0 2024-09-23 09:57:25,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=232922.66666666666, ans=0.0 2024-09-23 09:57:33,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=232922.66666666666, ans=0.1 2024-09-23 09:57:37,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=232922.66666666666, ans=0.0 2024-09-23 09:57:40,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=232969.33333333334, ans=0.125 2024-09-23 09:57:59,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=233016.0, ans=0.125 2024-09-23 09:58:07,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=233016.0, ans=0.125 2024-09-23 09:58:19,654 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.308e+02 1.468e+02 1.678e+02 2.567e+02, threshold=2.937e+02, percent-clipped=0.0 2024-09-23 09:58:28,992 INFO [train.py:1198] (3/4) Epoch 13, batch 3200, loss[loss=0.2532, ctc_loss=0.174, cr_loss=0.396, over 17196.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1605, cr_loss=0.3725, over 3349295.82 frames. ], batch size: 55, lr: 9.18e-03, grad_scale: 32.0 2024-09-23 09:58:43,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=233156.0, ans=0.125 2024-09-23 09:58:46,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=233156.0, ans=0.0 2024-09-23 09:59:05,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=233202.66666666666, ans=0.0 2024-09-23 09:59:42,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=233296.0, ans=0.04949747468305833 2024-09-23 09:59:47,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=233296.0, ans=0.2 2024-09-23 09:59:47,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=233296.0, ans=0.05 2024-09-23 09:59:47,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=233296.0, ans=0.125 2024-09-23 09:59:52,148 INFO [train.py:1198] (3/4) Epoch 13, batch 3250, loss[loss=0.2123, ctc_loss=0.1465, cr_loss=0.3289, over 17003.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1608, cr_loss=0.3729, over 3351903.60 frames. ], batch size: 44, lr: 9.17e-03, grad_scale: 32.0 2024-09-23 09:59:52,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=233342.66666666666, ans=0.1 2024-09-23 09:59:53,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=233342.66666666666, ans=0.125 2024-09-23 09:59:54,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=233342.66666666666, ans=0.0 2024-09-23 10:00:08,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=233389.33333333334, ans=0.2 2024-09-23 10:00:08,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=233389.33333333334, ans=0.125 2024-09-23 10:00:20,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=233389.33333333334, ans=0.125 2024-09-23 10:00:21,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2024-09-23 10:00:33,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-09-23 10:01:01,303 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.287e+02 1.416e+02 1.581e+02 2.137e+02, threshold=2.832e+02, percent-clipped=0.0 2024-09-23 10:01:01,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=233529.33333333334, ans=0.125 2024-09-23 10:01:10,548 INFO [train.py:1198] (3/4) Epoch 13, batch 3300, loss[loss=0.2841, ctc_loss=0.2039, cr_loss=0.401, over 14950.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1613, cr_loss=0.3734, over 3351871.02 frames. ], batch size: 89, lr: 9.17e-03, grad_scale: 32.0 2024-09-23 10:01:24,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=22.5 2024-09-23 10:01:28,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=233622.66666666666, ans=0.0 2024-09-23 10:01:31,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=233622.66666666666, ans=0.0 2024-09-23 10:02:25,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=233762.66666666666, ans=0.0 2024-09-23 10:02:30,431 INFO [train.py:1198] (3/4) Epoch 13, batch 3350, loss[loss=0.1941, ctc_loss=0.1323, cr_loss=0.3087, over 16756.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1628, cr_loss=0.3759, over 3348996.42 frames. ], batch size: 37, lr: 9.16e-03, grad_scale: 32.0 2024-09-23 10:02:30,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=233809.33333333334, ans=0.1 2024-09-23 10:02:32,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=233809.33333333334, ans=0.125 2024-09-23 10:02:43,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=233809.33333333334, ans=0.0 2024-09-23 10:02:50,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=233856.0, ans=0.1 2024-09-23 10:02:57,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=233856.0, ans=0.2 2024-09-23 10:02:57,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=233856.0, ans=0.2 2024-09-23 10:03:28,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=233949.33333333334, ans=0.125 2024-09-23 10:03:34,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=233996.0, ans=0.0 2024-09-23 10:03:38,964 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.272e+02 1.377e+02 1.520e+02 2.187e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-23 10:03:42,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=233996.0, ans=0.125 2024-09-23 10:03:43,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=233996.0, ans=0.125 2024-09-23 10:03:48,298 INFO [train.py:1198] (3/4) Epoch 13, batch 3400, loss[loss=0.2175, ctc_loss=0.146, cr_loss=0.3577, over 16954.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1626, cr_loss=0.3752, over 3343290.02 frames. ], batch size: 42, lr: 9.16e-03, grad_scale: 32.0 2024-09-23 10:04:25,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=234136.0, ans=0.125 2024-09-23 10:04:43,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=234182.66666666666, ans=0.125 2024-09-23 10:05:06,444 INFO [train.py:1198] (3/4) Epoch 13, batch 3450, loss[loss=0.2349, ctc_loss=0.1597, cr_loss=0.3762, over 16935.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1622, cr_loss=0.3757, over 3353880.56 frames. ], batch size: 58, lr: 9.15e-03, grad_scale: 32.0 2024-09-23 10:05:56,183 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.16 vs. limit=15.0 2024-09-23 10:06:14,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=234462.66666666666, ans=0.0 2024-09-23 10:06:15,807 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.304e+02 1.412e+02 1.643e+02 2.368e+02, threshold=2.824e+02, percent-clipped=0.0 2024-09-23 10:06:25,215 INFO [train.py:1198] (3/4) Epoch 13, batch 3500, loss[loss=0.2747, ctc_loss=0.192, cr_loss=0.4137, over 16787.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.162, cr_loss=0.3751, over 3352210.76 frames. ], batch size: 61, lr: 9.15e-03, grad_scale: 32.0 2024-09-23 10:06:31,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=234509.33333333334, ans=0.0 2024-09-23 10:06:49,583 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.41 vs. limit=15.0 2024-09-23 10:07:25,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=234649.33333333334, ans=0.125 2024-09-23 10:07:45,896 INFO [train.py:1198] (3/4) Epoch 13, batch 3550, loss[loss=0.2469, ctc_loss=0.1659, cr_loss=0.4051, over 17217.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1627, cr_loss=0.3759, over 3349933.08 frames. ], batch size: 55, lr: 9.14e-03, grad_scale: 32.0 2024-09-23 10:07:59,182 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-09-23 10:08:03,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=234789.33333333334, ans=0.125 2024-09-23 10:08:04,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=234789.33333333334, ans=0.0 2024-09-23 10:09:00,083 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.323e+02 1.492e+02 1.691e+02 2.949e+02, threshold=2.984e+02, percent-clipped=2.0 2024-09-23 10:09:00,357 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 10:09:04,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=234929.33333333334, ans=0.1 2024-09-23 10:09:07,584 INFO [train.py:1198] (3/4) Epoch 13, batch 3600, loss[loss=0.2627, ctc_loss=0.1838, cr_loss=0.3947, over 16467.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1622, cr_loss=0.3745, over 3345979.27 frames. ], batch size: 66, lr: 9.14e-03, grad_scale: 32.0 2024-09-23 10:09:10,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=234976.0, ans=0.1 2024-09-23 10:09:21,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235022.66666666666, ans=0.1 2024-09-23 10:09:28,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2024-09-23 10:09:34,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=235022.66666666666, ans=0.0 2024-09-23 10:09:37,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235069.33333333334, ans=0.1 2024-09-23 10:10:13,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=235162.66666666666, ans=0.2 2024-09-23 10:10:25,895 INFO [train.py:1198] (3/4) Epoch 13, batch 3650, loss[loss=0.2076, ctc_loss=0.1425, cr_loss=0.3255, over 17206.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1616, cr_loss=0.3739, over 3354696.80 frames. ], batch size: 47, lr: 9.14e-03, grad_scale: 32.0 2024-09-23 10:10:32,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=235209.33333333334, ans=0.125 2024-09-23 10:11:37,917 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.270e+02 1.340e+02 1.472e+02 2.759e+02, threshold=2.681e+02, percent-clipped=0.0 2024-09-23 10:11:42,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=235396.0, ans=0.2 2024-09-23 10:11:45,790 INFO [train.py:1198] (3/4) Epoch 13, batch 3700, loss[loss=0.2349, ctc_loss=0.1597, cr_loss=0.376, over 17051.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1613, cr_loss=0.3742, over 3366940.58 frames. ], batch size: 52, lr: 9.13e-03, grad_scale: 32.0 2024-09-23 10:12:08,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=235489.33333333334, ans=0.125 2024-09-23 10:12:09,419 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.12 vs. limit=15.0 2024-09-23 10:12:15,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=235489.33333333334, ans=0.05 2024-09-23 10:12:22,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=235536.0, ans=0.025 2024-09-23 10:13:03,922 INFO [train.py:1198] (3/4) Epoch 13, batch 3750, loss[loss=0.2281, ctc_loss=0.1551, cr_loss=0.365, over 16960.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1617, cr_loss=0.3743, over 3354864.57 frames. ], batch size: 42, lr: 9.13e-03, grad_scale: 32.0 2024-09-23 10:13:05,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=235676.0, ans=0.2 2024-09-23 10:13:15,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=235676.0, ans=0.0 2024-09-23 10:13:43,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=15.0 2024-09-23 10:13:48,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=235769.33333333334, ans=0.125 2024-09-23 10:13:51,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=235816.0, ans=0.125 2024-09-23 10:14:09,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=235862.66666666666, ans=0.1 2024-09-23 10:14:14,235 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.047e+02 1.350e+02 1.440e+02 1.635e+02 2.891e+02, threshold=2.880e+02, percent-clipped=1.0 2024-09-23 10:14:22,031 INFO [train.py:1198] (3/4) Epoch 13, batch 3800, loss[loss=0.2362, ctc_loss=0.1592, cr_loss=0.3849, over 17011.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1633, cr_loss=0.376, over 3331514.47 frames. ], batch size: 44, lr: 9.12e-03, grad_scale: 32.0 2024-09-23 10:14:31,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=235909.33333333334, ans=0.0 2024-09-23 10:14:36,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=235956.0, ans=0.0 2024-09-23 10:14:53,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=236002.66666666666, ans=0.025 2024-09-23 10:15:03,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.79 vs. limit=10.0 2024-09-23 10:15:07,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=236049.33333333334, ans=0.125 2024-09-23 10:15:10,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=236049.33333333334, ans=0.1 2024-09-23 10:15:12,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-09-23 10:15:22,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=236096.0, ans=0.125 2024-09-23 10:15:33,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=236096.0, ans=0.0 2024-09-23 10:15:40,300 INFO [train.py:1198] (3/4) Epoch 13, batch 3850, loss[loss=0.3088, ctc_loss=0.2308, cr_loss=0.3899, over 11817.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1656, cr_loss=0.3761, over 3258138.86 frames. ], batch size: 123, lr: 9.12e-03, grad_scale: 16.0 2024-09-23 10:16:12,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=236236.0, ans=0.0 2024-09-23 10:16:20,691 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=22.5 2024-09-23 10:16:33,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=236282.66666666666, ans=0.125 2024-09-23 10:16:39,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=236329.33333333334, ans=0.5 2024-09-23 10:16:45,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=236329.33333333334, ans=0.125 2024-09-23 10:17:39,608 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.439e+02 1.592e+02 1.784e+02 2.502e+02, threshold=3.185e+02, percent-clipped=0.0 2024-09-23 10:17:39,633 INFO [train.py:1198] (3/4) Epoch 14, batch 0, loss[loss=0.2238, ctc_loss=0.1521, cr_loss=0.3586, over 16398.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1521, cr_loss=0.3586, over 16398.00 frames. ], batch size: 36, lr: 8.78e-03, grad_scale: 32.0 2024-09-23 10:17:39,633 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 10:17:55,043 INFO [train.py:1230] (3/4) Epoch 14, validation: loss=0.04435, ctc_loss=0.04435, cr_loss=7.317e-15, over 944034.00 frames. 2024-09-23 10:17:55,044 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 10:18:09,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=236357.33333333334, ans=0.125 2024-09-23 10:18:09,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=236357.33333333334, ans=0.0 2024-09-23 10:18:17,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.37 vs. limit=10.0 2024-09-23 10:19:02,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=236544.0, ans=0.125 2024-09-23 10:19:18,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2024-09-23 10:19:21,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=236590.66666666666, ans=0.05 2024-09-23 10:19:22,451 INFO [train.py:1198] (3/4) Epoch 14, batch 50, loss[loss=0.2455, ctc_loss=0.1661, cr_loss=0.3969, over 17069.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1617, cr_loss=0.375, over 755665.39 frames. ], batch size: 46, lr: 8.78e-03, grad_scale: 32.0 2024-09-23 10:19:32,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=236590.66666666666, ans=0.0 2024-09-23 10:19:33,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=236590.66666666666, ans=0.0 2024-09-23 10:19:45,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=236637.33333333334, ans=0.2 2024-09-23 10:20:42,383 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.241e+02 1.435e+02 1.718e+02 2.310e+02, threshold=2.871e+02, percent-clipped=0.0 2024-09-23 10:20:42,408 INFO [train.py:1198] (3/4) Epoch 14, batch 100, loss[loss=0.2446, ctc_loss=0.1707, cr_loss=0.3695, over 17212.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1601, cr_loss=0.3726, over 1332311.33 frames. ], batch size: 50, lr: 8.77e-03, grad_scale: 32.0 2024-09-23 10:20:51,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.61 vs. limit=10.0 2024-09-23 10:20:56,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2024-09-23 10:21:16,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=236917.33333333334, ans=0.125 2024-09-23 10:21:22,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=236917.33333333334, ans=0.0 2024-09-23 10:21:27,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=236917.33333333334, ans=0.025 2024-09-23 10:21:49,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=237010.66666666666, ans=0.0 2024-09-23 10:22:03,254 INFO [train.py:1198] (3/4) Epoch 14, batch 150, loss[loss=0.2075, ctc_loss=0.1372, cr_loss=0.3517, over 17018.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1605, cr_loss=0.3743, over 1773754.94 frames. ], batch size: 39, lr: 8.77e-03, grad_scale: 32.0 2024-09-23 10:22:16,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=237057.33333333334, ans=0.0 2024-09-23 10:22:18,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237104.0, ans=0.1 2024-09-23 10:22:35,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=237150.66666666666, ans=0.025 2024-09-23 10:22:40,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=237150.66666666666, ans=0.125 2024-09-23 10:22:44,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=237150.66666666666, ans=0.0 2024-09-23 10:22:46,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=237150.66666666666, ans=0.0 2024-09-23 10:22:51,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=237150.66666666666, ans=0.125 2024-09-23 10:22:51,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2024-09-23 10:22:51,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.46 vs. limit=6.0 2024-09-23 10:22:52,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=237197.33333333334, ans=0.125 2024-09-23 10:23:02,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2024-09-23 10:23:22,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=237244.0, ans=10.0 2024-09-23 10:23:28,629 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.307e+02 1.446e+02 1.689e+02 2.559e+02, threshold=2.892e+02, percent-clipped=0.0 2024-09-23 10:23:28,655 INFO [train.py:1198] (3/4) Epoch 14, batch 200, loss[loss=0.206, ctc_loss=0.1381, cr_loss=0.3395, over 16959.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1611, cr_loss=0.3753, over 2126088.31 frames. ], batch size: 42, lr: 8.76e-03, grad_scale: 32.0 2024-09-23 10:23:28,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=237290.66666666666, ans=0.2 2024-09-23 10:23:33,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=237290.66666666666, ans=0.125 2024-09-23 10:23:38,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=237290.66666666666, ans=0.2 2024-09-23 10:23:47,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=237337.33333333334, ans=0.125 2024-09-23 10:24:38,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=237477.33333333334, ans=0.125 2024-09-23 10:24:45,283 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.09 vs. limit=6.0 2024-09-23 10:24:52,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237524.0, ans=0.1 2024-09-23 10:24:53,918 INFO [train.py:1198] (3/4) Epoch 14, batch 250, loss[loss=0.1969, ctc_loss=0.1299, cr_loss=0.3351, over 16938.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1606, cr_loss=0.3741, over 2398549.14 frames. ], batch size: 42, lr: 8.76e-03, grad_scale: 32.0 2024-09-23 10:25:24,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237617.33333333334, ans=0.1 2024-09-23 10:25:31,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=237617.33333333334, ans=0.125 2024-09-23 10:25:42,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=237664.0, ans=15.0 2024-09-23 10:25:54,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=237664.0, ans=0.0 2024-09-23 10:26:12,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=237757.33333333334, ans=0.1 2024-09-23 10:26:13,348 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.272e+02 1.438e+02 1.640e+02 2.630e+02, threshold=2.876e+02, percent-clipped=0.0 2024-09-23 10:26:13,374 INFO [train.py:1198] (3/4) Epoch 14, batch 300, loss[loss=0.2008, ctc_loss=0.1289, cr_loss=0.3592, over 16341.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1608, cr_loss=0.3749, over 2598390.61 frames. ], batch size: 36, lr: 8.76e-03, grad_scale: 32.0 2024-09-23 10:26:16,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.82 vs. limit=22.5 2024-09-23 10:26:31,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237804.0, ans=0.1 2024-09-23 10:26:43,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237850.66666666666, ans=0.1 2024-09-23 10:26:46,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=237850.66666666666, ans=0.125 2024-09-23 10:26:52,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2024-09-23 10:27:29,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=237944.0, ans=0.0 2024-09-23 10:27:32,702 INFO [train.py:1198] (3/4) Epoch 14, batch 350, loss[loss=0.2726, ctc_loss=0.1826, cr_loss=0.45, over 17003.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1601, cr_loss=0.3736, over 2763364.41 frames. ], batch size: 51, lr: 8.75e-03, grad_scale: 32.0 2024-09-23 10:27:33,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2024-09-23 10:27:36,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-09-23 10:27:41,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=237990.66666666666, ans=0.125 2024-09-23 10:27:45,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=237990.66666666666, ans=0.2 2024-09-23 10:28:42,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=238177.33333333334, ans=0.09899494936611666 2024-09-23 10:29:02,515 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 1.337e+02 1.482e+02 1.714e+02 2.531e+02, threshold=2.964e+02, percent-clipped=0.0 2024-09-23 10:29:02,540 INFO [train.py:1198] (3/4) Epoch 14, batch 400, loss[loss=0.2608, ctc_loss=0.1793, cr_loss=0.4076, over 17024.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1605, cr_loss=0.3742, over 2896289.72 frames. ], batch size: 53, lr: 8.75e-03, grad_scale: 32.0 2024-09-23 10:29:18,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=238270.66666666666, ans=0.125 2024-09-23 10:29:26,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=238270.66666666666, ans=0.125 2024-09-23 10:29:31,298 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 10:29:46,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=238317.33333333334, ans=0.2 2024-09-23 10:30:12,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=238410.66666666666, ans=0.125 2024-09-23 10:30:14,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=238410.66666666666, ans=15.0 2024-09-23 10:30:21,632 INFO [train.py:1198] (3/4) Epoch 14, batch 450, loss[loss=0.2426, ctc_loss=0.1685, cr_loss=0.3706, over 16544.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.16, cr_loss=0.374, over 3001996.31 frames. ], batch size: 66, lr: 8.74e-03, grad_scale: 32.0 2024-09-23 10:30:25,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=238457.33333333334, ans=0.125 2024-09-23 10:30:30,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=238457.33333333334, ans=0.125 2024-09-23 10:30:58,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=238550.66666666666, ans=0.05 2024-09-23 10:30:58,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=238550.66666666666, ans=0.05 2024-09-23 10:31:38,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=238644.0, ans=0.1 2024-09-23 10:31:41,087 INFO [train.py:1198] (3/4) Epoch 14, batch 500, loss[loss=0.214, ctc_loss=0.1439, cr_loss=0.3505, over 17287.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.159, cr_loss=0.3725, over 3089029.84 frames. ], batch size: 51, lr: 8.74e-03, grad_scale: 16.0 2024-09-23 10:31:42,720 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.208e+02 1.324e+02 1.495e+02 2.086e+02, threshold=2.649e+02, percent-clipped=0.0 2024-09-23 10:31:49,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=238690.66666666666, ans=0.07 2024-09-23 10:31:50,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=238690.66666666666, ans=0.125 2024-09-23 10:31:59,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.93 vs. limit=15.0 2024-09-23 10:31:59,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.86 vs. limit=6.0 2024-09-23 10:32:00,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=238737.33333333334, ans=0.04949747468305833 2024-09-23 10:32:06,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=238737.33333333334, ans=0.125 2024-09-23 10:32:19,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=238784.0, ans=0.125 2024-09-23 10:33:06,647 INFO [train.py:1198] (3/4) Epoch 14, batch 550, loss[loss=0.2448, ctc_loss=0.1625, cr_loss=0.4113, over 17017.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1595, cr_loss=0.373, over 3152743.52 frames. ], batch size: 51, lr: 8.74e-03, grad_scale: 16.0 2024-09-23 10:33:33,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=238970.66666666666, ans=0.125 2024-09-23 10:33:53,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=239017.33333333334, ans=0.025 2024-09-23 10:34:05,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=239064.0, ans=0.0 2024-09-23 10:34:17,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=239110.66666666666, ans=0.2 2024-09-23 10:34:18,589 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2024-09-23 10:34:21,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=239110.66666666666, ans=0.0 2024-09-23 10:34:31,907 INFO [train.py:1198] (3/4) Epoch 14, batch 600, loss[loss=0.2207, ctc_loss=0.1507, cr_loss=0.35, over 17158.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1587, cr_loss=0.3718, over 3202826.09 frames. ], batch size: 45, lr: 8.73e-03, grad_scale: 16.0 2024-09-23 10:34:33,455 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.266e+02 1.344e+02 1.475e+02 2.652e+02, threshold=2.689e+02, percent-clipped=1.0 2024-09-23 10:34:34,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-09-23 10:34:36,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239157.33333333334, ans=0.1 2024-09-23 10:34:46,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=239204.0, ans=0.2 2024-09-23 10:34:54,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=239204.0, ans=0.0 2024-09-23 10:35:02,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=239250.66666666666, ans=0.1 2024-09-23 10:35:34,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=239344.0, ans=0.025 2024-09-23 10:35:40,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239344.0, ans=0.1 2024-09-23 10:35:51,350 INFO [train.py:1198] (3/4) Epoch 14, batch 650, loss[loss=0.2387, ctc_loss=0.1641, cr_loss=0.3731, over 16934.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1585, cr_loss=0.371, over 3232962.33 frames. ], batch size: 58, lr: 8.73e-03, grad_scale: 16.0 2024-09-23 10:35:53,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=239390.66666666666, ans=0.125 2024-09-23 10:36:33,440 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=22.5 2024-09-23 10:37:11,360 INFO [train.py:1198] (3/4) Epoch 14, batch 700, loss[loss=0.2034, ctc_loss=0.1368, cr_loss=0.3329, over 17262.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1578, cr_loss=0.3701, over 3262409.84 frames. ], batch size: 42, lr: 8.72e-03, grad_scale: 16.0 2024-09-23 10:37:13,005 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.260e+02 1.373e+02 1.552e+02 2.322e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-23 10:37:14,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239624.0, ans=0.1 2024-09-23 10:38:21,164 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 10:38:24,244 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 10:38:29,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239810.66666666666, ans=0.1 2024-09-23 10:38:39,757 INFO [train.py:1198] (3/4) Epoch 14, batch 750, loss[loss=0.217, ctc_loss=0.1468, cr_loss=0.351, over 17234.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1581, cr_loss=0.3709, over 3287101.94 frames. ], batch size: 50, lr: 8.72e-03, grad_scale: 16.0 2024-09-23 10:38:57,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=239904.0, ans=0.125 2024-09-23 10:39:38,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239997.33333333334, ans=0.1 2024-09-23 10:39:50,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.01 vs. limit=12.0 2024-09-23 10:39:56,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=240044.0, ans=0.2 2024-09-23 10:39:57,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=240044.0, ans=0.1 2024-09-23 10:39:59,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=240044.0, ans=0.2 2024-09-23 10:40:02,419 INFO [train.py:1198] (3/4) Epoch 14, batch 800, loss[loss=0.2124, ctc_loss=0.1427, cr_loss=0.3487, over 17166.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1577, cr_loss=0.3707, over 3311668.30 frames. ], batch size: 45, lr: 8.71e-03, grad_scale: 32.0 2024-09-23 10:40:03,952 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.283e+02 1.393e+02 1.518e+02 3.186e+02, threshold=2.786e+02, percent-clipped=2.0 2024-09-23 10:40:08,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=240090.66666666666, ans=0.0 2024-09-23 10:40:24,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=240137.33333333334, ans=0.025 2024-09-23 10:40:29,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=240137.33333333334, ans=0.0 2024-09-23 10:40:39,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=240184.0, ans=0.0 2024-09-23 10:40:50,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=240230.66666666666, ans=0.125 2024-09-23 10:40:53,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=240230.66666666666, ans=0.125 2024-09-23 10:40:56,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=240230.66666666666, ans=0.125 2024-09-23 10:41:22,389 INFO [train.py:1198] (3/4) Epoch 14, batch 850, loss[loss=0.315, ctc_loss=0.2297, cr_loss=0.4263, over 11409.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1577, cr_loss=0.3709, over 3326178.59 frames. ], batch size: 123, lr: 8.71e-03, grad_scale: 32.0 2024-09-23 10:41:36,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=240370.66666666666, ans=0.125 2024-09-23 10:41:54,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=240417.33333333334, ans=0.025 2024-09-23 10:41:56,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=240417.33333333334, ans=0.025 2024-09-23 10:41:59,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=240417.33333333334, ans=0.025 2024-09-23 10:42:10,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=240464.0, ans=0.125 2024-09-23 10:42:25,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.12 vs. limit=10.0 2024-09-23 10:42:44,014 INFO [train.py:1198] (3/4) Epoch 14, batch 900, loss[loss=0.1985, ctc_loss=0.131, cr_loss=0.3375, over 17134.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1583, cr_loss=0.3718, over 3335163.82 frames. ], batch size: 40, lr: 8.71e-03, grad_scale: 32.0 2024-09-23 10:42:48,314 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.265e+02 1.359e+02 1.500e+02 2.203e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-23 10:42:56,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=240557.33333333334, ans=0.0 2024-09-23 10:43:17,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=240650.66666666666, ans=0.2 2024-09-23 10:43:25,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=240650.66666666666, ans=0.125 2024-09-23 10:43:30,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=240650.66666666666, ans=0.125 2024-09-23 10:43:47,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=240697.33333333334, ans=0.0 2024-09-23 10:44:11,585 INFO [train.py:1198] (3/4) Epoch 14, batch 950, loss[loss=0.2262, ctc_loss=0.1506, cr_loss=0.3781, over 16954.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1585, cr_loss=0.3722, over 3336989.64 frames. ], batch size: 42, lr: 8.70e-03, grad_scale: 32.0 2024-09-23 10:44:32,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=240837.33333333334, ans=0.125 2024-09-23 10:44:39,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=240837.33333333334, ans=0.1 2024-09-23 10:44:57,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2024-09-23 10:45:06,648 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 10:45:11,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=240930.66666666666, ans=0.0 2024-09-23 10:45:21,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=240977.33333333334, ans=0.125 2024-09-23 10:45:24,541 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=12.0 2024-09-23 10:45:31,843 INFO [train.py:1198] (3/4) Epoch 14, batch 1000, loss[loss=0.199, ctc_loss=0.1353, cr_loss=0.3181, over 17048.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1596, cr_loss=0.3738, over 3335837.19 frames. ], batch size: 39, lr: 8.70e-03, grad_scale: 32.0 2024-09-23 10:45:33,324 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.317e+02 1.495e+02 1.766e+02 2.386e+02, threshold=2.990e+02, percent-clipped=0.0 2024-09-23 10:46:16,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=241117.33333333334, ans=0.035 2024-09-23 10:46:48,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.37 vs. limit=15.0 2024-09-23 10:46:52,004 INFO [train.py:1198] (3/4) Epoch 14, batch 1050, loss[loss=0.1945, ctc_loss=0.1305, cr_loss=0.32, over 17093.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1587, cr_loss=0.3724, over 3335705.27 frames. ], batch size: 43, lr: 8.69e-03, grad_scale: 32.0 2024-09-23 10:46:52,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=241257.33333333334, ans=0.1 2024-09-23 10:47:22,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=241350.66666666666, ans=0.125 2024-09-23 10:47:38,466 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 10:48:01,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=241444.0, ans=0.2 2024-09-23 10:48:06,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=241444.0, ans=0.025 2024-09-23 10:48:12,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=241444.0, ans=0.2 2024-09-23 10:48:16,988 INFO [train.py:1198] (3/4) Epoch 14, batch 1100, loss[loss=0.2216, ctc_loss=0.1528, cr_loss=0.3442, over 17136.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1591, cr_loss=0.3721, over 3327381.77 frames. ], batch size: 48, lr: 8.69e-03, grad_scale: 32.0 2024-09-23 10:48:18,620 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.315e+02 1.444e+02 1.614e+02 2.728e+02, threshold=2.888e+02, percent-clipped=0.0 2024-09-23 10:48:20,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=241490.66666666666, ans=0.0 2024-09-23 10:48:32,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=241490.66666666666, ans=0.1 2024-09-23 10:48:34,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=241537.33333333334, ans=0.125 2024-09-23 10:48:42,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=241537.33333333334, ans=0.1 2024-09-23 10:48:55,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=241584.0, ans=0.125 2024-09-23 10:48:56,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2024-09-23 10:49:02,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=241584.0, ans=0.125 2024-09-23 10:49:19,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=241630.66666666666, ans=0.04949747468305833 2024-09-23 10:49:21,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=241630.66666666666, ans=0.1 2024-09-23 10:49:27,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=241677.33333333334, ans=0.125 2024-09-23 10:49:28,161 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.93 vs. limit=15.0 2024-09-23 10:49:30,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=241677.33333333334, ans=0.0 2024-09-23 10:49:41,262 INFO [train.py:1198] (3/4) Epoch 14, batch 1150, loss[loss=0.2184, ctc_loss=0.1441, cr_loss=0.3711, over 17217.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1592, cr_loss=0.3721, over 3329594.93 frames. ], batch size: 47, lr: 8.69e-03, grad_scale: 32.0 2024-09-23 10:49:41,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=241724.0, ans=0.07 2024-09-23 10:49:53,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=241724.0, ans=0.05 2024-09-23 10:50:06,288 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2024-09-23 10:50:15,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=241817.33333333334, ans=0.125 2024-09-23 10:50:45,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=241910.66666666666, ans=0.0 2024-09-23 10:51:01,213 INFO [train.py:1198] (3/4) Epoch 14, batch 1200, loss[loss=0.229, ctc_loss=0.156, cr_loss=0.3647, over 17010.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.158, cr_loss=0.3707, over 3342245.14 frames. ], batch size: 51, lr: 8.68e-03, grad_scale: 32.0 2024-09-23 10:51:02,790 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.057e+02 1.307e+02 1.418e+02 1.626e+02 2.907e+02, threshold=2.837e+02, percent-clipped=1.0 2024-09-23 10:51:03,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=241957.33333333334, ans=0.125 2024-09-23 10:51:47,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=242097.33333333334, ans=0.025 2024-09-23 10:52:20,959 INFO [train.py:1198] (3/4) Epoch 14, batch 1250, loss[loss=0.2355, ctc_loss=0.1601, cr_loss=0.3769, over 17053.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1582, cr_loss=0.3705, over 3354548.63 frames. ], batch size: 52, lr: 8.68e-03, grad_scale: 32.0 2024-09-23 10:52:22,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=242190.66666666666, ans=0.125 2024-09-23 10:53:05,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=242284.0, ans=0.125 2024-09-23 10:53:16,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=242330.66666666666, ans=0.125 2024-09-23 10:53:18,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=242330.66666666666, ans=0.2 2024-09-23 10:53:37,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=242377.33333333334, ans=0.5 2024-09-23 10:53:46,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=242377.33333333334, ans=0.125 2024-09-23 10:53:49,820 INFO [train.py:1198] (3/4) Epoch 14, batch 1300, loss[loss=0.2806, ctc_loss=0.1954, cr_loss=0.426, over 15062.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1582, cr_loss=0.3711, over 3359739.78 frames. ], batch size: 89, lr: 8.67e-03, grad_scale: 32.0 2024-09-23 10:53:51,342 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.264e+02 1.376e+02 1.514e+02 2.274e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-23 10:54:07,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=242470.66666666666, ans=0.025 2024-09-23 10:54:17,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=242470.66666666666, ans=0.0 2024-09-23 10:54:26,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=242517.33333333334, ans=0.035 2024-09-23 10:54:31,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=242517.33333333334, ans=0.0 2024-09-23 10:54:36,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242564.0, ans=0.1 2024-09-23 10:54:46,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=242564.0, ans=0.1 2024-09-23 10:54:54,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=242610.66666666666, ans=0.2 2024-09-23 10:55:04,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2024-09-23 10:55:07,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2024-09-23 10:55:08,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=242657.33333333334, ans=0.0 2024-09-23 10:55:10,060 INFO [train.py:1198] (3/4) Epoch 14, batch 1350, loss[loss=0.2468, ctc_loss=0.1754, cr_loss=0.3574, over 14953.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1584, cr_loss=0.3711, over 3358012.11 frames. ], batch size: 89, lr: 8.67e-03, grad_scale: 32.0 2024-09-23 10:55:15,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=242657.33333333334, ans=0.125 2024-09-23 10:55:17,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=242657.33333333334, ans=0.0 2024-09-23 10:55:37,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=242704.0, ans=0.125 2024-09-23 10:55:49,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=242750.66666666666, ans=0.125 2024-09-23 10:55:58,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=242797.33333333334, ans=0.07 2024-09-23 10:56:32,048 INFO [train.py:1198] (3/4) Epoch 14, batch 1400, loss[loss=0.255, ctc_loss=0.1742, cr_loss=0.4036, over 16731.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1575, cr_loss=0.3705, over 3364201.60 frames. ], batch size: 61, lr: 8.67e-03, grad_scale: 32.0 2024-09-23 10:56:33,632 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.305e+02 1.425e+02 1.607e+02 2.757e+02, threshold=2.850e+02, percent-clipped=1.0 2024-09-23 10:56:53,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=242937.33333333334, ans=0.2 2024-09-23 10:56:59,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=242937.33333333334, ans=0.125 2024-09-23 10:57:00,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=242937.33333333334, ans=0.125 2024-09-23 10:57:15,827 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.01 vs. limit=10.0 2024-09-23 10:57:18,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=243030.66666666666, ans=0.125 2024-09-23 10:57:21,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=243030.66666666666, ans=0.125 2024-09-23 10:57:53,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=243077.33333333334, ans=0.1 2024-09-23 10:57:56,919 INFO [train.py:1198] (3/4) Epoch 14, batch 1450, loss[loss=0.2764, ctc_loss=0.1881, cr_loss=0.4416, over 17231.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1582, cr_loss=0.3716, over 3373877.14 frames. ], batch size: 55, lr: 8.66e-03, grad_scale: 16.0 2024-09-23 10:58:04,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.68 vs. limit=22.5 2024-09-23 10:58:12,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=243170.66666666666, ans=0.1 2024-09-23 10:58:17,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=243170.66666666666, ans=0.125 2024-09-23 10:58:58,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2024-09-23 10:58:59,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=243264.0, ans=0.2 2024-09-23 10:59:21,665 INFO [train.py:1198] (3/4) Epoch 14, batch 1500, loss[loss=0.2265, ctc_loss=0.1538, cr_loss=0.3633, over 17015.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.158, cr_loss=0.3708, over 3365772.71 frames. ], batch size: 44, lr: 8.66e-03, grad_scale: 16.0 2024-09-23 10:59:24,853 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.258e+02 1.373e+02 1.539e+02 2.095e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-23 10:59:34,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=243357.33333333334, ans=0.125 2024-09-23 10:59:38,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2024-09-23 10:59:41,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.61 vs. limit=15.0 2024-09-23 10:59:48,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=243404.0, ans=0.125 2024-09-23 10:59:55,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=243450.66666666666, ans=0.0 2024-09-23 11:00:33,984 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:00:38,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=243544.0, ans=0.1 2024-09-23 11:00:41,713 INFO [train.py:1198] (3/4) Epoch 14, batch 1550, loss[loss=0.2194, ctc_loss=0.1452, cr_loss=0.3712, over 17165.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1585, cr_loss=0.3719, over 3358158.35 frames. ], batch size: 48, lr: 8.65e-03, grad_scale: 16.0 2024-09-23 11:01:30,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=243730.66666666666, ans=0.0 2024-09-23 11:01:31,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=243730.66666666666, ans=0.1 2024-09-23 11:01:38,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=243730.66666666666, ans=0.025 2024-09-23 11:02:01,667 INFO [train.py:1198] (3/4) Epoch 14, batch 1600, loss[loss=0.2354, ctc_loss=0.1582, cr_loss=0.3863, over 17081.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1586, cr_loss=0.372, over 3360352.18 frames. ], batch size: 46, lr: 8.65e-03, grad_scale: 32.0 2024-09-23 11:02:04,730 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.281e+02 1.419e+02 1.549e+02 2.365e+02, threshold=2.838e+02, percent-clipped=0.0 2024-09-23 11:02:13,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=243824.0, ans=0.025 2024-09-23 11:02:57,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=243964.0, ans=0.125 2024-09-23 11:03:10,100 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-09-23 11:03:30,842 INFO [train.py:1198] (3/4) Epoch 14, batch 1650, loss[loss=0.2186, ctc_loss=0.1441, cr_loss=0.3721, over 17071.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1589, cr_loss=0.3727, over 3355617.37 frames. ], batch size: 46, lr: 8.64e-03, grad_scale: 32.0 2024-09-23 11:03:31,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=244057.33333333334, ans=0.0 2024-09-23 11:04:28,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=244197.33333333334, ans=0.025 2024-09-23 11:04:38,098 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:04:50,835 INFO [train.py:1198] (3/4) Epoch 14, batch 1700, loss[loss=0.2341, ctc_loss=0.1617, cr_loss=0.362, over 17218.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1574, cr_loss=0.3698, over 3350199.27 frames. ], batch size: 50, lr: 8.64e-03, grad_scale: 32.0 2024-09-23 11:04:54,006 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.254e+02 1.382e+02 1.612e+02 3.536e+02, threshold=2.764e+02, percent-clipped=2.0 2024-09-23 11:04:57,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=244290.66666666666, ans=0.1 2024-09-23 11:04:59,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2024-09-23 11:05:19,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=244337.33333333334, ans=0.125 2024-09-23 11:05:27,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=244384.0, ans=0.125 2024-09-23 11:05:46,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=244430.66666666666, ans=0.05 2024-09-23 11:05:55,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=12.0 2024-09-23 11:06:07,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=244477.33333333334, ans=0.0 2024-09-23 11:06:10,499 INFO [train.py:1198] (3/4) Epoch 14, batch 1750, loss[loss=0.2391, ctc_loss=0.1613, cr_loss=0.389, over 16732.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1581, cr_loss=0.3711, over 3356496.48 frames. ], batch size: 61, lr: 8.64e-03, grad_scale: 32.0 2024-09-23 11:06:45,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2024-09-23 11:06:49,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=244617.33333333334, ans=0.1 2024-09-23 11:07:36,214 INFO [train.py:1198] (3/4) Epoch 14, batch 1800, loss[loss=0.2751, ctc_loss=0.1913, cr_loss=0.4187, over 17007.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1594, cr_loss=0.3723, over 3356331.11 frames. ], batch size: 52, lr: 8.63e-03, grad_scale: 32.0 2024-09-23 11:07:37,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.77 vs. limit=10.0 2024-09-23 11:07:39,493 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.282e+02 1.372e+02 1.529e+02 2.252e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-23 11:07:54,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=244804.0, ans=0.0 2024-09-23 11:08:21,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=244850.66666666666, ans=0.95 2024-09-23 11:08:29,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=244897.33333333334, ans=0.125 2024-09-23 11:09:01,694 INFO [train.py:1198] (3/4) Epoch 14, batch 1850, loss[loss=0.2258, ctc_loss=0.1503, cr_loss=0.3776, over 17042.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1585, cr_loss=0.3708, over 3361851.89 frames. ], batch size: 44, lr: 8.63e-03, grad_scale: 32.0 2024-09-23 11:09:13,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=244990.66666666666, ans=0.125 2024-09-23 11:10:02,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=245130.66666666666, ans=0.2 2024-09-23 11:10:21,849 INFO [train.py:1198] (3/4) Epoch 14, batch 1900, loss[loss=0.2058, ctc_loss=0.139, cr_loss=0.334, over 17086.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.158, cr_loss=0.3699, over 3361692.25 frames. ], batch size: 43, lr: 8.62e-03, grad_scale: 32.0 2024-09-23 11:10:25,082 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.260e+02 1.374e+02 1.529e+02 3.130e+02, threshold=2.747e+02, percent-clipped=1.0 2024-09-23 11:10:38,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=245270.66666666666, ans=0.0 2024-09-23 11:10:43,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2024-09-23 11:11:02,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=245317.33333333334, ans=0.125 2024-09-23 11:11:10,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=245364.0, ans=0.07 2024-09-23 11:11:10,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.55 vs. limit=10.0 2024-09-23 11:11:23,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=245364.0, ans=0.2 2024-09-23 11:11:30,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.70 vs. limit=15.0 2024-09-23 11:11:38,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=245410.66666666666, ans=0.125 2024-09-23 11:11:41,463 INFO [train.py:1198] (3/4) Epoch 14, batch 1950, loss[loss=0.255, ctc_loss=0.1752, cr_loss=0.399, over 17304.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1584, cr_loss=0.3696, over 3343960.34 frames. ], batch size: 51, lr: 8.62e-03, grad_scale: 32.0 2024-09-23 11:11:48,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=245457.33333333334, ans=0.0 2024-09-23 11:12:08,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=245504.0, ans=0.04949747468305833 2024-09-23 11:13:08,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.68 vs. limit=10.0 2024-09-23 11:13:09,061 INFO [train.py:1198] (3/4) Epoch 14, batch 2000, loss[loss=0.285, ctc_loss=0.2079, cr_loss=0.3855, over 11854.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1589, cr_loss=0.3692, over 3324178.61 frames. ], batch size: 123, lr: 8.62e-03, grad_scale: 32.0 2024-09-23 11:13:14,678 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.297e+02 1.399e+02 1.635e+02 2.518e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-23 11:13:37,522 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:14:06,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2024-09-23 11:14:13,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2024-09-23 11:14:31,854 INFO [train.py:1198] (3/4) Epoch 14, batch 2050, loss[loss=0.2324, ctc_loss=0.1584, cr_loss=0.3702, over 17159.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.159, cr_loss=0.3709, over 3334697.98 frames. ], batch size: 45, lr: 8.61e-03, grad_scale: 32.0 2024-09-23 11:14:40,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=245924.0, ans=0.125 2024-09-23 11:14:54,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245970.66666666666, ans=0.1 2024-09-23 11:15:08,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=246017.33333333334, ans=0.125 2024-09-23 11:15:52,151 INFO [train.py:1198] (3/4) Epoch 14, batch 2100, loss[loss=0.2759, ctc_loss=0.1908, cr_loss=0.4253, over 17003.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1598, cr_loss=0.3723, over 3325731.87 frames. ], batch size: 51, lr: 8.61e-03, grad_scale: 32.0 2024-09-23 11:15:55,407 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.001e+02 1.226e+02 1.315e+02 1.406e+02 3.033e+02, threshold=2.629e+02, percent-clipped=1.0 2024-09-23 11:15:57,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2024-09-23 11:16:00,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=246157.33333333334, ans=0.125 2024-09-23 11:16:03,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=246157.33333333334, ans=0.07 2024-09-23 11:16:15,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=246204.0, ans=0.125 2024-09-23 11:16:32,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=246250.66666666666, ans=0.1 2024-09-23 11:17:15,245 INFO [train.py:1198] (3/4) Epoch 14, batch 2150, loss[loss=0.1941, ctc_loss=0.1293, cr_loss=0.324, over 16267.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1598, cr_loss=0.3722, over 3330604.79 frames. ], batch size: 36, lr: 8.60e-03, grad_scale: 32.0 2024-09-23 11:17:44,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=246437.33333333334, ans=0.0 2024-09-23 11:17:46,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=246437.33333333334, ans=0.125 2024-09-23 11:18:13,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=246530.66666666666, ans=0.125 2024-09-23 11:18:29,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=246577.33333333334, ans=0.2 2024-09-23 11:18:29,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=246577.33333333334, ans=0.2 2024-09-23 11:18:31,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=246577.33333333334, ans=0.0 2024-09-23 11:18:32,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=246577.33333333334, ans=0.2 2024-09-23 11:18:43,400 INFO [train.py:1198] (3/4) Epoch 14, batch 2200, loss[loss=0.2672, ctc_loss=0.1818, cr_loss=0.4266, over 17030.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1601, cr_loss=0.3737, over 3335602.88 frames. ], batch size: 56, lr: 8.60e-03, grad_scale: 32.0 2024-09-23 11:18:46,518 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.334e+02 1.425e+02 1.540e+02 2.133e+02, threshold=2.850e+02, percent-clipped=0.0 2024-09-23 11:18:47,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.85 vs. limit=15.0 2024-09-23 11:18:51,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=246624.0, ans=0.0 2024-09-23 11:19:18,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=246717.33333333334, ans=0.2 2024-09-23 11:19:33,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=246764.0, ans=0.1 2024-09-23 11:19:36,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=246764.0, ans=0.125 2024-09-23 11:19:41,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=246764.0, ans=0.125 2024-09-23 11:19:41,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.89 vs. limit=12.0 2024-09-23 11:19:55,553 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:20:02,899 INFO [train.py:1198] (3/4) Epoch 14, batch 2250, loss[loss=0.2696, ctc_loss=0.193, cr_loss=0.3832, over 11715.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1612, cr_loss=0.3752, over 3322797.00 frames. ], batch size: 123, lr: 8.60e-03, grad_scale: 16.0 2024-09-23 11:20:04,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=246857.33333333334, ans=0.125 2024-09-23 11:20:04,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=246857.33333333334, ans=0.2 2024-09-23 11:20:12,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=246857.33333333334, ans=0.125 2024-09-23 11:20:19,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=246904.0, ans=0.035 2024-09-23 11:20:30,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=246904.0, ans=0.125 2024-09-23 11:21:00,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=246997.33333333334, ans=0.0 2024-09-23 11:21:19,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=247044.0, ans=0.0 2024-09-23 11:21:22,504 INFO [train.py:1198] (3/4) Epoch 14, batch 2300, loss[loss=0.2024, ctc_loss=0.1366, cr_loss=0.3293, over 17263.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1611, cr_loss=0.3749, over 3317210.33 frames. ], batch size: 44, lr: 8.59e-03, grad_scale: 16.0 2024-09-23 11:21:27,262 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.285e+02 1.398e+02 1.588e+02 2.479e+02, threshold=2.795e+02, percent-clipped=0.0 2024-09-23 11:21:29,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=247090.66666666666, ans=0.125 2024-09-23 11:22:50,511 INFO [train.py:1198] (3/4) Epoch 14, batch 2350, loss[loss=0.2433, ctc_loss=0.1687, cr_loss=0.3727, over 17299.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1608, cr_loss=0.3754, over 3327668.56 frames. ], batch size: 46, lr: 8.59e-03, grad_scale: 16.0 2024-09-23 11:24:12,732 INFO [train.py:1198] (3/4) Epoch 14, batch 2400, loss[loss=0.2335, ctc_loss=0.1566, cr_loss=0.3842, over 16770.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.159, cr_loss=0.3733, over 3342407.35 frames. ], batch size: 61, lr: 8.58e-03, grad_scale: 32.0 2024-09-23 11:24:17,510 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.242e+02 1.312e+02 1.459e+02 2.054e+02, threshold=2.624e+02, percent-clipped=0.0 2024-09-23 11:24:27,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=247604.0, ans=0.125 2024-09-23 11:24:35,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=247604.0, ans=0.05 2024-09-23 11:24:38,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=247604.0, ans=0.125 2024-09-23 11:25:19,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=247744.0, ans=0.0 2024-09-23 11:25:32,183 INFO [train.py:1198] (3/4) Epoch 14, batch 2450, loss[loss=0.2314, ctc_loss=0.1518, cr_loss=0.3984, over 17053.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1591, cr_loss=0.3735, over 3337469.50 frames. ], batch size: 46, lr: 8.58e-03, grad_scale: 32.0 2024-09-23 11:25:46,168 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=22.5 2024-09-23 11:26:13,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=247884.0, ans=15.0 2024-09-23 11:26:22,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=247930.66666666666, ans=0.125 2024-09-23 11:26:54,749 INFO [train.py:1198] (3/4) Epoch 14, batch 2500, loss[loss=0.2229, ctc_loss=0.1489, cr_loss=0.3699, over 17303.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1584, cr_loss=0.3726, over 3341837.04 frames. ], batch size: 51, lr: 8.58e-03, grad_scale: 32.0 2024-09-23 11:26:55,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=248024.0, ans=0.2 2024-09-23 11:26:59,532 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.311e+02 1.464e+02 1.674e+02 2.701e+02, threshold=2.928e+02, percent-clipped=1.0 2024-09-23 11:27:03,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-09-23 11:27:12,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=248070.66666666666, ans=0.0 2024-09-23 11:27:19,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.88 vs. limit=15.0 2024-09-23 11:28:21,883 INFO [train.py:1198] (3/4) Epoch 14, batch 2550, loss[loss=0.2647, ctc_loss=0.1807, cr_loss=0.4202, over 16638.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1585, cr_loss=0.3722, over 3341851.66 frames. ], batch size: 66, lr: 8.57e-03, grad_scale: 32.0 2024-09-23 11:28:43,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=248304.0, ans=10.0 2024-09-23 11:29:39,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=248444.0, ans=15.0 2024-09-23 11:29:42,050 INFO [train.py:1198] (3/4) Epoch 14, batch 2600, loss[loss=0.2277, ctc_loss=0.1516, cr_loss=0.3805, over 17213.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1579, cr_loss=0.3717, over 3346391.99 frames. ], batch size: 47, lr: 8.57e-03, grad_scale: 32.0 2024-09-23 11:29:46,742 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.321e+02 1.442e+02 1.644e+02 2.368e+02, threshold=2.883e+02, percent-clipped=0.0 2024-09-23 11:30:09,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=22.5 2024-09-23 11:30:14,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=248584.0, ans=0.95 2024-09-23 11:30:33,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=248630.66666666666, ans=0.0 2024-09-23 11:30:51,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=248677.33333333334, ans=0.2 2024-09-23 11:30:51,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.75 vs. limit=12.0 2024-09-23 11:31:02,247 INFO [train.py:1198] (3/4) Epoch 14, batch 2650, loss[loss=0.2199, ctc_loss=0.1477, cr_loss=0.3608, over 17182.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1573, cr_loss=0.3709, over 3350418.78 frames. ], batch size: 45, lr: 8.56e-03, grad_scale: 32.0 2024-09-23 11:31:23,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=248770.66666666666, ans=0.125 2024-09-23 11:31:40,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=248817.33333333334, ans=0.125 2024-09-23 11:32:23,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=248910.66666666666, ans=0.125 2024-09-23 11:32:26,707 INFO [train.py:1198] (3/4) Epoch 14, batch 2700, loss[loss=0.2658, ctc_loss=0.1861, cr_loss=0.3984, over 16988.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1584, cr_loss=0.3724, over 3356859.12 frames. ], batch size: 56, lr: 8.56e-03, grad_scale: 32.0 2024-09-23 11:32:31,440 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.327e+02 1.447e+02 1.619e+02 2.182e+02, threshold=2.895e+02, percent-clipped=0.0 2024-09-23 11:32:52,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=249004.0, ans=0.04949747468305833 2024-09-23 11:32:56,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=249004.0, ans=0.0 2024-09-23 11:33:05,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=249050.66666666666, ans=0.125 2024-09-23 11:33:07,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=249050.66666666666, ans=0.0 2024-09-23 11:33:13,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=249050.66666666666, ans=0.125 2024-09-23 11:33:31,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=249097.33333333334, ans=0.2 2024-09-23 11:33:34,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=249144.0, ans=0.0 2024-09-23 11:33:42,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.12 vs. limit=15.0 2024-09-23 11:33:46,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=249144.0, ans=0.1 2024-09-23 11:33:51,512 INFO [train.py:1198] (3/4) Epoch 14, batch 2750, loss[loss=0.2396, ctc_loss=0.164, cr_loss=0.3778, over 17032.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.159, cr_loss=0.3726, over 3364125.60 frames. ], batch size: 52, lr: 8.56e-03, grad_scale: 32.0 2024-09-23 11:33:51,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=249190.66666666666, ans=0.025 2024-09-23 11:34:25,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=249284.0, ans=0.0 2024-09-23 11:34:38,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.88 vs. limit=22.5 2024-09-23 11:34:58,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=249377.33333333334, ans=0.0 2024-09-23 11:35:01,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=249377.33333333334, ans=0.025 2024-09-23 11:35:10,883 INFO [train.py:1198] (3/4) Epoch 14, batch 2800, loss[loss=0.2324, ctc_loss=0.1604, cr_loss=0.3601, over 17300.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1582, cr_loss=0.3715, over 3367130.77 frames. ], batch size: 46, lr: 8.55e-03, grad_scale: 32.0 2024-09-23 11:35:14,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=249424.0, ans=0.125 2024-09-23 11:35:15,651 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.308e+02 1.382e+02 1.526e+02 2.267e+02, threshold=2.765e+02, percent-clipped=0.0 2024-09-23 11:35:37,989 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:35:46,091 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:35:49,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=249517.33333333334, ans=0.125 2024-09-23 11:35:52,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=249517.33333333334, ans=0.0 2024-09-23 11:36:03,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=249564.0, ans=0.05 2024-09-23 11:36:08,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=249564.0, ans=0.1 2024-09-23 11:36:31,111 INFO [train.py:1198] (3/4) Epoch 14, batch 2850, loss[loss=0.2016, ctc_loss=0.1357, cr_loss=0.3298, over 17167.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.159, cr_loss=0.3721, over 3350494.12 frames. ], batch size: 41, lr: 8.55e-03, grad_scale: 16.0 2024-09-23 11:36:56,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=249704.0, ans=0.1 2024-09-23 11:37:12,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=249750.66666666666, ans=0.025 2024-09-23 11:37:16,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=249750.66666666666, ans=0.125 2024-09-23 11:37:20,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=249750.66666666666, ans=0.02 2024-09-23 11:37:21,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=249750.66666666666, ans=0.2 2024-09-23 11:37:26,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=249797.33333333334, ans=10.0 2024-09-23 11:37:26,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2024-09-23 11:38:01,842 INFO [train.py:1198] (3/4) Epoch 14, batch 2900, loss[loss=0.2476, ctc_loss=0.1743, cr_loss=0.3666, over 16569.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1598, cr_loss=0.3727, over 3331341.36 frames. ], batch size: 66, lr: 8.55e-03, grad_scale: 16.0 2024-09-23 11:38:08,319 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.270e+02 1.418e+02 1.645e+02 2.792e+02, threshold=2.835e+02, percent-clipped=1.0 2024-09-23 11:38:30,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=249937.33333333334, ans=0.125 2024-09-23 11:38:45,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=249984.0, ans=0.0 2024-09-23 11:38:50,308 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.94 vs. limit=15.0 2024-09-23 11:39:09,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=250077.33333333334, ans=0.025 2024-09-23 11:39:21,421 INFO [train.py:1198] (3/4) Epoch 14, batch 2950, loss[loss=0.2692, ctc_loss=0.1937, cr_loss=0.3775, over 11881.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1591, cr_loss=0.3719, over 3339031.93 frames. ], batch size: 123, lr: 8.54e-03, grad_scale: 16.0 2024-09-23 11:39:31,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=250124.0, ans=0.025 2024-09-23 11:39:33,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=250124.0, ans=0.125 2024-09-23 11:39:38,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=12.0 2024-09-23 11:39:48,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=250170.66666666666, ans=0.025 2024-09-23 11:39:50,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=22.5 2024-09-23 11:40:40,237 INFO [train.py:1198] (3/4) Epoch 14, batch 3000, loss[loss=0.2597, ctc_loss=0.1811, cr_loss=0.3934, over 15968.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1593, cr_loss=0.3723, over 3339538.19 frames. ], batch size: 74, lr: 8.54e-03, grad_scale: 16.0 2024-09-23 11:40:40,237 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 11:40:55,474 INFO [train.py:1230] (3/4) Epoch 14, validation: loss=0.04331, ctc_loss=0.04331, cr_loss=7.532e-15, over 944034.00 frames. 2024-09-23 11:40:55,474 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 11:41:01,569 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.302e+02 1.389e+02 1.457e+02 1.974e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-23 11:41:08,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=250357.33333333334, ans=0.125 2024-09-23 11:41:11,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=12.0 2024-09-23 11:41:12,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=250404.0, ans=0.125 2024-09-23 11:41:14,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=250404.0, ans=0.125 2024-09-23 11:41:17,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=250404.0, ans=0.125 2024-09-23 11:41:40,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=250450.66666666666, ans=0.09899494936611666 2024-09-23 11:41:57,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.03 vs. limit=22.5 2024-09-23 11:42:13,881 INFO [train.py:1198] (3/4) Epoch 14, batch 3050, loss[loss=0.2183, ctc_loss=0.1488, cr_loss=0.3477, over 17040.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1594, cr_loss=0.373, over 3329902.94 frames. ], batch size: 44, lr: 8.53e-03, grad_scale: 16.0 2024-09-23 11:42:23,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=250590.66666666666, ans=0.0 2024-09-23 11:42:26,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=250590.66666666666, ans=0.0 2024-09-23 11:42:26,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=250590.66666666666, ans=0.2 2024-09-23 11:42:39,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=250637.33333333334, ans=0.0 2024-09-23 11:43:23,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=250777.33333333334, ans=0.0 2024-09-23 11:43:30,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=250777.33333333334, ans=0.2 2024-09-23 11:43:34,615 INFO [train.py:1198] (3/4) Epoch 14, batch 3100, loss[loss=0.2396, ctc_loss=0.1628, cr_loss=0.3838, over 17315.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1582, cr_loss=0.372, over 3341334.47 frames. ], batch size: 51, lr: 8.53e-03, grad_scale: 16.0 2024-09-23 11:43:37,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=250824.0, ans=0.0 2024-09-23 11:43:40,947 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.263e+02 1.328e+02 1.443e+02 2.080e+02, threshold=2.656e+02, percent-clipped=0.0 2024-09-23 11:43:55,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=250870.66666666666, ans=0.0 2024-09-23 11:44:15,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.94 vs. limit=10.0 2024-09-23 11:44:16,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=250917.33333333334, ans=0.125 2024-09-23 11:44:24,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=250964.0, ans=0.95 2024-09-23 11:44:24,423 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:44:33,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=250964.0, ans=0.0 2024-09-23 11:44:49,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=251010.66666666666, ans=0.2 2024-09-23 11:44:55,829 INFO [train.py:1198] (3/4) Epoch 14, batch 3150, loss[loss=0.1744, ctc_loss=0.1182, cr_loss=0.2811, over 16225.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1579, cr_loss=0.3718, over 3349644.21 frames. ], batch size: 36, lr: 8.53e-03, grad_scale: 16.0 2024-09-23 11:45:10,864 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2024-09-23 11:45:12,252 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-23 11:45:32,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=251150.66666666666, ans=0.2 2024-09-23 11:46:18,650 INFO [train.py:1198] (3/4) Epoch 14, batch 3200, loss[loss=0.2458, ctc_loss=0.1683, cr_loss=0.3873, over 17302.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1575, cr_loss=0.3712, over 3352786.24 frames. ], batch size: 46, lr: 8.52e-03, grad_scale: 32.0 2024-09-23 11:46:18,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=251290.66666666666, ans=10.0 2024-09-23 11:46:24,743 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.267e+02 1.361e+02 1.514e+02 1.918e+02, threshold=2.723e+02, percent-clipped=0.0 2024-09-23 11:46:28,333 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:46:34,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=251337.33333333334, ans=0.125 2024-09-23 11:46:45,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2024-09-23 11:46:57,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=251384.0, ans=0.125 2024-09-23 11:47:06,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=251430.66666666666, ans=0.1 2024-09-23 11:47:11,910 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.58 vs. limit=10.0 2024-09-23 11:47:31,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=251477.33333333334, ans=0.2 2024-09-23 11:47:36,178 INFO [train.py:1198] (3/4) Epoch 14, batch 3250, loss[loss=0.2152, ctc_loss=0.1452, cr_loss=0.3501, over 17164.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1567, cr_loss=0.3701, over 3358407.98 frames. ], batch size: 45, lr: 8.52e-03, grad_scale: 32.0 2024-09-23 11:48:26,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=251664.0, ans=0.125 2024-09-23 11:48:29,932 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2024-09-23 11:48:40,491 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:48:43,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=251710.66666666666, ans=0.0 2024-09-23 11:48:54,100 INFO [train.py:1198] (3/4) Epoch 14, batch 3300, loss[loss=0.2123, ctc_loss=0.142, cr_loss=0.3515, over 17292.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1569, cr_loss=0.3709, over 3360305.57 frames. ], batch size: 46, lr: 8.51e-03, grad_scale: 32.0 2024-09-23 11:48:57,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=251757.33333333334, ans=0.125 2024-09-23 11:49:00,435 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.303e+02 1.410e+02 1.606e+02 3.318e+02, threshold=2.819e+02, percent-clipped=1.0 2024-09-23 11:49:00,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=251757.33333333334, ans=0.0 2024-09-23 11:49:02,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=251757.33333333334, ans=0.0 2024-09-23 11:49:06,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=251757.33333333334, ans=0.1 2024-09-23 11:49:21,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=251804.0, ans=0.025 2024-09-23 11:49:22,015 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=22.5 2024-09-23 11:49:28,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=251850.66666666666, ans=0.125 2024-09-23 11:49:33,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=251850.66666666666, ans=0.125 2024-09-23 11:50:01,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=251944.0, ans=0.125 2024-09-23 11:50:11,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=251990.66666666666, ans=0.0 2024-09-23 11:50:12,289 INFO [train.py:1198] (3/4) Epoch 14, batch 3350, loss[loss=0.2459, ctc_loss=0.1652, cr_loss=0.4037, over 17295.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1569, cr_loss=0.3718, over 3362876.72 frames. ], batch size: 51, lr: 8.51e-03, grad_scale: 16.0 2024-09-23 11:50:47,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=252084.0, ans=0.0 2024-09-23 11:51:15,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=252177.33333333334, ans=0.0 2024-09-23 11:51:18,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=252177.33333333334, ans=0.1 2024-09-23 11:51:21,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=252177.33333333334, ans=0.125 2024-09-23 11:51:27,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=252177.33333333334, ans=0.1 2024-09-23 11:51:30,495 INFO [train.py:1198] (3/4) Epoch 14, batch 3400, loss[loss=0.2442, ctc_loss=0.1663, cr_loss=0.3891, over 17315.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1573, cr_loss=0.3717, over 3353153.68 frames. ], batch size: 51, lr: 8.51e-03, grad_scale: 16.0 2024-09-23 11:51:38,162 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.275e+02 1.402e+02 1.543e+02 4.509e+02, threshold=2.804e+02, percent-clipped=1.0 2024-09-23 11:51:51,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=252270.66666666666, ans=0.125 2024-09-23 11:51:56,389 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-23 11:52:31,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=252410.66666666666, ans=0.0 2024-09-23 11:52:34,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=252410.66666666666, ans=0.125 2024-09-23 11:52:42,243 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:52:48,080 INFO [train.py:1198] (3/4) Epoch 14, batch 3450, loss[loss=0.215, ctc_loss=0.1419, cr_loss=0.3652, over 17098.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1581, cr_loss=0.3724, over 3345187.94 frames. ], batch size: 40, lr: 8.50e-03, grad_scale: 16.0 2024-09-23 11:53:02,945 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2024-09-23 11:53:46,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=252597.33333333334, ans=0.2 2024-09-23 11:53:53,675 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2024-09-23 11:53:56,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=252644.0, ans=0.125 2024-09-23 11:54:08,252 INFO [train.py:1198] (3/4) Epoch 14, batch 3500, loss[loss=0.2217, ctc_loss=0.1504, cr_loss=0.3562, over 17224.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1579, cr_loss=0.3724, over 3349319.37 frames. ], batch size: 50, lr: 8.50e-03, grad_scale: 16.0 2024-09-23 11:54:08,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=252690.66666666666, ans=0.0 2024-09-23 11:54:18,111 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.252e+02 1.352e+02 1.458e+02 2.935e+02, threshold=2.705e+02, percent-clipped=1.0 2024-09-23 11:54:26,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.36 vs. limit=15.0 2024-09-23 11:54:45,250 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-09-23 11:54:54,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=252784.0, ans=0.125 2024-09-23 11:55:15,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=252877.33333333334, ans=0.1 2024-09-23 11:55:21,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=252877.33333333334, ans=0.125 2024-09-23 11:55:32,281 INFO [train.py:1198] (3/4) Epoch 14, batch 3550, loss[loss=0.2533, ctc_loss=0.1748, cr_loss=0.3923, over 16520.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1574, cr_loss=0.3724, over 3358464.67 frames. ], batch size: 66, lr: 8.49e-03, grad_scale: 16.0 2024-09-23 11:55:34,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=252924.0, ans=0.0 2024-09-23 11:55:39,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=252924.0, ans=0.0 2024-09-23 11:55:45,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=252924.0, ans=0.125 2024-09-23 11:56:22,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=253064.0, ans=0.1 2024-09-23 11:56:38,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=253110.66666666666, ans=0.025 2024-09-23 11:56:44,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=253110.66666666666, ans=0.125 2024-09-23 11:56:51,146 INFO [train.py:1198] (3/4) Epoch 14, batch 3600, loss[loss=0.2045, ctc_loss=0.1378, cr_loss=0.3332, over 17261.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1575, cr_loss=0.371, over 3342321.47 frames. ], batch size: 44, lr: 8.49e-03, grad_scale: 32.0 2024-09-23 11:56:58,883 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.239e+02 1.349e+02 1.491e+02 2.999e+02, threshold=2.699e+02, percent-clipped=1.0 2024-09-23 11:57:10,653 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.12 vs. limit=12.0 2024-09-23 11:57:11,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=253204.0, ans=0.0 2024-09-23 11:57:49,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=253297.33333333334, ans=0.125 2024-09-23 11:57:52,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=253344.0, ans=0.0 2024-09-23 11:58:01,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=253344.0, ans=0.0 2024-09-23 11:58:08,968 INFO [train.py:1198] (3/4) Epoch 14, batch 3650, loss[loss=0.2245, ctc_loss=0.1586, cr_loss=0.3299, over 17366.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1577, cr_loss=0.3719, over 3343352.88 frames. ], batch size: 48, lr: 8.49e-03, grad_scale: 32.0 2024-09-23 11:58:17,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.50 vs. limit=15.0 2024-09-23 11:58:44,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2024-09-23 11:58:54,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=253484.0, ans=0.0 2024-09-23 11:59:25,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2024-09-23 11:59:28,446 INFO [train.py:1198] (3/4) Epoch 14, batch 3700, loss[loss=0.2237, ctc_loss=0.1508, cr_loss=0.3649, over 16987.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1574, cr_loss=0.3716, over 3351579.13 frames. ], batch size: 53, lr: 8.48e-03, grad_scale: 32.0 2024-09-23 11:59:36,288 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.294e+02 1.387e+02 1.607e+02 1.987e+02, threshold=2.774e+02, percent-clipped=0.0 2024-09-23 11:59:45,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2024-09-23 11:59:56,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=253670.66666666666, ans=0.0 2024-09-23 12:00:09,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=253717.33333333334, ans=0.2 2024-09-23 12:00:40,867 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 12:00:46,837 INFO [train.py:1198] (3/4) Epoch 14, batch 3750, loss[loss=0.2514, ctc_loss=0.1695, cr_loss=0.4097, over 17032.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1581, cr_loss=0.3715, over 3341606.37 frames. ], batch size: 51, lr: 8.48e-03, grad_scale: 32.0 2024-09-23 12:01:22,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.35 vs. limit=8.0 2024-09-23 12:01:40,404 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 12:01:45,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=253997.33333333334, ans=0.1 2024-09-23 12:02:05,051 INFO [train.py:1198] (3/4) Epoch 14, batch 3800, loss[loss=0.2724, ctc_loss=0.1959, cr_loss=0.3824, over 11609.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1596, cr_loss=0.3728, over 3313158.03 frames. ], batch size: 123, lr: 8.48e-03, grad_scale: 32.0 2024-09-23 12:02:13,078 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.174e+02 1.380e+02 1.551e+02 1.777e+02 3.575e+02, threshold=3.102e+02, percent-clipped=2.0 2024-09-23 12:02:38,739 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 12:02:56,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=254230.66666666666, ans=0.125 2024-09-23 12:02:58,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=254230.66666666666, ans=0.125 2024-09-23 12:03:06,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=254230.66666666666, ans=0.05 2024-09-23 12:03:24,332 INFO [train.py:1198] (3/4) Epoch 14, batch 3850, loss[loss=0.3069, ctc_loss=0.2261, cr_loss=0.4038, over 11431.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1607, cr_loss=0.3734, over 3282517.94 frames. ], batch size: 123, lr: 8.47e-03, grad_scale: 32.0 2024-09-23 12:03:37,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=254324.0, ans=0.125 2024-09-23 12:03:48,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=254370.66666666666, ans=0.125 2024-09-23 12:03:50,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=254370.66666666666, ans=0.1 2024-09-23 12:03:50,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=254370.66666666666, ans=0.125 2024-09-23 12:04:11,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.03 vs. limit=15.0 2024-09-23 12:05:28,636 INFO [train.py:1198] (3/4) Epoch 15, batch 0, loss[loss=0.2053, ctc_loss=0.1404, cr_loss=0.3247, over 16950.00 frames. ], tot_loss[loss=0.2053, ctc_loss=0.1404, cr_loss=0.3247, over 16950.00 frames. ], batch size: 42, lr: 8.18e-03, grad_scale: 32.0 2024-09-23 12:05:28,637 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 12:05:46,330 INFO [train.py:1230] (3/4) Epoch 15, validation: loss=0.0431, ctc_loss=0.0431, cr_loss=7.486e-15, over 944034.00 frames. 2024-09-23 12:05:46,331 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 12:05:56,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=254538.66666666666, ans=0.1 2024-09-23 12:06:00,797 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.387e+02 1.561e+02 1.706e+02 2.670e+02, threshold=3.121e+02, percent-clipped=0.0 2024-09-23 12:06:12,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=254585.33333333334, ans=0.04949747468305833 2024-09-23 12:06:20,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=254632.0, ans=0.1 2024-09-23 12:06:46,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.47 vs. limit=15.0 2024-09-23 12:06:48,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=254725.33333333334, ans=0.0 2024-09-23 12:06:53,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=254725.33333333334, ans=0.04949747468305833 2024-09-23 12:07:05,545 INFO [train.py:1198] (3/4) Epoch 15, batch 50, loss[loss=0.219, ctc_loss=0.1494, cr_loss=0.348, over 17073.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1622, cr_loss=0.3791, over 753809.47 frames. ], batch size: 43, lr: 8.18e-03, grad_scale: 32.0 2024-09-23 12:07:37,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.56 vs. limit=22.5 2024-09-23 12:07:38,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=254865.33333333334, ans=0.125 2024-09-23 12:08:01,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=254912.0, ans=0.0 2024-09-23 12:08:11,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=254958.66666666666, ans=0.0 2024-09-23 12:08:28,618 INFO [train.py:1198] (3/4) Epoch 15, batch 100, loss[loss=0.1874, ctc_loss=0.1247, cr_loss=0.3135, over 17196.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1588, cr_loss=0.3746, over 1329870.80 frames. ], batch size: 41, lr: 8.17e-03, grad_scale: 32.0 2024-09-23 12:08:42,830 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 9.954e+01 1.228e+02 1.306e+02 1.476e+02 1.867e+02, threshold=2.613e+02, percent-clipped=0.0 2024-09-23 12:09:34,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=255192.0, ans=0.2 2024-09-23 12:09:35,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2024-09-23 12:09:47,785 INFO [train.py:1198] (3/4) Epoch 15, batch 150, loss[loss=0.2167, ctc_loss=0.1425, cr_loss=0.3711, over 17033.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1589, cr_loss=0.3749, over 1783907.84 frames. ], batch size: 44, lr: 8.17e-03, grad_scale: 32.0 2024-09-23 12:10:02,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=255238.66666666666, ans=0.0 2024-09-23 12:10:28,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2024-09-23 12:10:42,057 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=22.5 2024-09-23 12:10:42,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=255378.66666666666, ans=0.1 2024-09-23 12:10:54,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=255378.66666666666, ans=0.1 2024-09-23 12:10:54,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=255378.66666666666, ans=0.95 2024-09-23 12:11:14,517 INFO [train.py:1198] (3/4) Epoch 15, batch 200, loss[loss=0.2106, ctc_loss=0.143, cr_loss=0.3378, over 17085.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1572, cr_loss=0.3721, over 2129834.00 frames. ], batch size: 40, lr: 8.16e-03, grad_scale: 32.0 2024-09-23 12:11:28,889 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.243e+02 1.308e+02 1.422e+02 1.839e+02, threshold=2.616e+02, percent-clipped=0.0 2024-09-23 12:11:52,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=255565.33333333334, ans=0.125 2024-09-23 12:12:26,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=255658.66666666666, ans=0.0 2024-09-23 12:12:29,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=255658.66666666666, ans=0.125 2024-09-23 12:12:33,975 INFO [train.py:1198] (3/4) Epoch 15, batch 250, loss[loss=0.1987, ctc_loss=0.1315, cr_loss=0.3364, over 17246.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1577, cr_loss=0.3714, over 2381083.36 frames. ], batch size: 44, lr: 8.16e-03, grad_scale: 32.0 2024-09-23 12:12:40,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=255705.33333333334, ans=0.025 2024-09-23 12:13:11,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.21 vs. limit=15.0 2024-09-23 12:13:12,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2024-09-23 12:13:21,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=255798.66666666666, ans=0.2 2024-09-23 12:13:23,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=255845.33333333334, ans=0.0 2024-09-23 12:13:36,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=255845.33333333334, ans=0.125 2024-09-23 12:13:51,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2024-09-23 12:13:56,719 INFO [train.py:1198] (3/4) Epoch 15, batch 300, loss[loss=0.1927, ctc_loss=0.1292, cr_loss=0.3175, over 17100.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1566, cr_loss=0.3704, over 2597506.70 frames. ], batch size: 40, lr: 8.16e-03, grad_scale: 32.0 2024-09-23 12:14:10,840 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.341e+02 1.470e+02 1.683e+02 2.993e+02, threshold=2.941e+02, percent-clipped=1.0 2024-09-23 12:14:14,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=255985.33333333334, ans=0.125 2024-09-23 12:14:22,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=255985.33333333334, ans=0.2 2024-09-23 12:14:31,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=256032.0, ans=0.125 2024-09-23 12:15:06,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=256125.33333333334, ans=0.125 2024-09-23 12:15:21,581 INFO [train.py:1198] (3/4) Epoch 15, batch 350, loss[loss=0.2399, ctc_loss=0.1585, cr_loss=0.4071, over 17003.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1565, cr_loss=0.3708, over 2768164.06 frames. ], batch size: 51, lr: 8.15e-03, grad_scale: 32.0 2024-09-23 12:15:21,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=256172.0, ans=0.125 2024-09-23 12:15:48,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=256218.66666666666, ans=0.2 2024-09-23 12:15:48,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.56 vs. limit=10.0 2024-09-23 12:16:03,054 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2024-09-23 12:16:15,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=256312.0, ans=0.125 2024-09-23 12:16:17,715 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.88 vs. limit=15.0 2024-09-23 12:16:30,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=15.0 2024-09-23 12:16:34,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=256358.66666666666, ans=0.2 2024-09-23 12:16:43,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.07 vs. limit=15.0 2024-09-23 12:16:44,051 INFO [train.py:1198] (3/4) Epoch 15, batch 400, loss[loss=0.1887, ctc_loss=0.1256, cr_loss=0.3155, over 17063.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.156, cr_loss=0.3699, over 2893974.24 frames. ], batch size: 39, lr: 8.15e-03, grad_scale: 32.0 2024-09-23 12:16:58,142 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.238e+02 1.377e+02 1.544e+02 2.269e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-23 12:17:20,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=256498.66666666666, ans=0.125 2024-09-23 12:17:36,048 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=22.5 2024-09-23 12:17:37,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=256545.33333333334, ans=0.5 2024-09-23 12:17:55,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=256592.0, ans=0.2 2024-09-23 12:18:00,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=256592.0, ans=0.2 2024-09-23 12:18:00,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256592.0, ans=0.1 2024-09-23 12:18:04,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.27 vs. limit=22.5 2024-09-23 12:18:06,463 INFO [train.py:1198] (3/4) Epoch 15, batch 450, loss[loss=0.2525, ctc_loss=0.1725, cr_loss=0.4, over 16739.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1564, cr_loss=0.3705, over 2991043.49 frames. ], batch size: 61, lr: 8.15e-03, grad_scale: 32.0 2024-09-23 12:18:21,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=256685.33333333334, ans=0.1 2024-09-23 12:18:35,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=256685.33333333334, ans=0.125 2024-09-23 12:19:01,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-09-23 12:19:14,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=256825.33333333334, ans=0.1 2024-09-23 12:19:27,188 INFO [train.py:1198] (3/4) Epoch 15, batch 500, loss[loss=0.2691, ctc_loss=0.1871, cr_loss=0.4101, over 17233.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1558, cr_loss=0.3695, over 3072702.29 frames. ], batch size: 55, lr: 8.14e-03, grad_scale: 32.0 2024-09-23 12:19:36,037 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2024-09-23 12:19:41,901 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.305e+02 1.439e+02 1.681e+02 2.242e+02, threshold=2.879e+02, percent-clipped=0.0 2024-09-23 12:19:43,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=256918.66666666666, ans=0.0 2024-09-23 12:20:15,128 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-09-23 12:20:16,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=256965.33333333334, ans=0.125 2024-09-23 12:20:20,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.92 vs. limit=10.0 2024-09-23 12:20:24,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=257012.0, ans=0.05 2024-09-23 12:20:25,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=257012.0, ans=0.1 2024-09-23 12:20:55,118 INFO [train.py:1198] (3/4) Epoch 15, batch 550, loss[loss=0.1974, ctc_loss=0.1316, cr_loss=0.3294, over 17098.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1562, cr_loss=0.3696, over 3136805.90 frames. ], batch size: 43, lr: 8.14e-03, grad_scale: 32.0 2024-09-23 12:21:10,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.94 vs. limit=22.5 2024-09-23 12:21:13,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=15.0 2024-09-23 12:21:14,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=257152.0, ans=0.025 2024-09-23 12:21:14,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=257152.0, ans=0.125 2024-09-23 12:21:24,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=257152.0, ans=0.0 2024-09-23 12:21:44,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=257245.33333333334, ans=0.125 2024-09-23 12:21:57,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=257292.0, ans=0.125 2024-09-23 12:22:15,291 INFO [train.py:1198] (3/4) Epoch 15, batch 600, loss[loss=0.2466, ctc_loss=0.1666, cr_loss=0.4001, over 17217.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1568, cr_loss=0.3703, over 3192799.50 frames. ], batch size: 47, lr: 8.14e-03, grad_scale: 32.0 2024-09-23 12:22:23,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=257338.66666666666, ans=0.125 2024-09-23 12:22:28,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=257338.66666666666, ans=0.125 2024-09-23 12:22:29,756 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.274e+02 1.386e+02 1.572e+02 2.356e+02, threshold=2.773e+02, percent-clipped=0.0 2024-09-23 12:22:37,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=257385.33333333334, ans=0.0 2024-09-23 12:23:20,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.52 vs. limit=15.0 2024-09-23 12:23:23,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=257525.33333333334, ans=0.125 2024-09-23 12:23:26,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=257525.33333333334, ans=0.1 2024-09-23 12:23:33,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=257525.33333333334, ans=0.0 2024-09-23 12:23:38,343 INFO [train.py:1198] (3/4) Epoch 15, batch 650, loss[loss=0.2276, ctc_loss=0.1524, cr_loss=0.3763, over 17368.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1574, cr_loss=0.3713, over 3214894.47 frames. ], batch size: 48, lr: 8.13e-03, grad_scale: 32.0 2024-09-23 12:23:41,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=257572.0, ans=0.1 2024-09-23 12:23:59,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=257618.66666666666, ans=0.1 2024-09-23 12:24:02,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=257618.66666666666, ans=0.125 2024-09-23 12:24:28,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2024-09-23 12:24:33,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=257712.0, ans=15.0 2024-09-23 12:25:01,784 INFO [train.py:1198] (3/4) Epoch 15, batch 700, loss[loss=0.2441, ctc_loss=0.168, cr_loss=0.3804, over 16988.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1567, cr_loss=0.3702, over 3244007.61 frames. ], batch size: 53, lr: 8.13e-03, grad_scale: 32.0 2024-09-23 12:25:10,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.96 vs. limit=22.5 2024-09-23 12:25:17,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=257805.33333333334, ans=0.5 2024-09-23 12:25:18,905 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.271e+02 1.381e+02 1.537e+02 2.206e+02, threshold=2.762e+02, percent-clipped=0.0 2024-09-23 12:25:29,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2024-09-23 12:25:46,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=15.0 2024-09-23 12:25:48,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-23 12:25:48,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=257898.66666666666, ans=0.125 2024-09-23 12:25:58,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=257945.33333333334, ans=0.125 2024-09-23 12:26:01,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=257945.33333333334, ans=0.0 2024-09-23 12:26:07,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=257945.33333333334, ans=0.125 2024-09-23 12:26:18,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=257992.0, ans=0.0 2024-09-23 12:26:20,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-09-23 12:26:26,614 INFO [train.py:1198] (3/4) Epoch 15, batch 750, loss[loss=0.2205, ctc_loss=0.1463, cr_loss=0.3713, over 17159.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1557, cr_loss=0.3699, over 3278134.95 frames. ], batch size: 45, lr: 8.12e-03, grad_scale: 32.0 2024-09-23 12:26:58,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=258132.0, ans=0.125 2024-09-23 12:27:22,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=258178.66666666666, ans=0.1 2024-09-23 12:27:48,956 INFO [train.py:1198] (3/4) Epoch 15, batch 800, loss[loss=0.2296, ctc_loss=0.157, cr_loss=0.3632, over 17283.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1568, cr_loss=0.3707, over 3293112.04 frames. ], batch size: 49, lr: 8.12e-03, grad_scale: 32.0 2024-09-23 12:28:03,181 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.255e+02 1.392e+02 1.544e+02 3.619e+02, threshold=2.784e+02, percent-clipped=1.0 2024-09-23 12:28:03,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=258318.66666666666, ans=0.0 2024-09-23 12:28:05,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=258318.66666666666, ans=0.05 2024-09-23 12:28:08,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2024-09-23 12:28:14,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-09-23 12:28:24,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=258365.33333333334, ans=0.125 2024-09-23 12:28:49,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=258412.0, ans=0.125 2024-09-23 12:29:03,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=258458.66666666666, ans=0.125 2024-09-23 12:29:08,257 INFO [train.py:1198] (3/4) Epoch 15, batch 850, loss[loss=0.2089, ctc_loss=0.1408, cr_loss=0.3405, over 17311.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1562, cr_loss=0.3704, over 3308162.86 frames. ], batch size: 46, lr: 8.12e-03, grad_scale: 32.0 2024-09-23 12:29:16,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=258505.33333333334, ans=0.0 2024-09-23 12:29:49,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=258598.66666666666, ans=0.0 2024-09-23 12:29:51,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=258598.66666666666, ans=0.125 2024-09-23 12:30:02,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=258645.33333333334, ans=0.1 2024-09-23 12:30:36,075 INFO [train.py:1198] (3/4) Epoch 15, batch 900, loss[loss=0.2335, ctc_loss=0.1547, cr_loss=0.394, over 16398.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1559, cr_loss=0.3703, over 3324694.95 frames. ], batch size: 66, lr: 8.11e-03, grad_scale: 32.0 2024-09-23 12:30:50,313 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.250e+02 1.335e+02 1.491e+02 2.252e+02, threshold=2.670e+02, percent-clipped=0.0 2024-09-23 12:31:05,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=258785.33333333334, ans=0.05 2024-09-23 12:31:08,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=258832.0, ans=0.125 2024-09-23 12:31:08,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=258832.0, ans=0.125 2024-09-23 12:31:22,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=258878.66666666666, ans=0.1 2024-09-23 12:31:28,715 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.34 vs. limit=15.0 2024-09-23 12:31:38,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=258925.33333333334, ans=0.1 2024-09-23 12:31:48,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=258925.33333333334, ans=0.2 2024-09-23 12:31:56,151 INFO [train.py:1198] (3/4) Epoch 15, batch 950, loss[loss=0.223, ctc_loss=0.1498, cr_loss=0.3662, over 17137.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1557, cr_loss=0.3706, over 3338469.95 frames. ], batch size: 48, lr: 8.11e-03, grad_scale: 16.0 2024-09-23 12:31:59,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=258972.0, ans=0.125 2024-09-23 12:32:01,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=258972.0, ans=0.04949747468305833 2024-09-23 12:32:03,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2024-09-23 12:32:09,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=258972.0, ans=0.0 2024-09-23 12:32:10,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=259018.66666666666, ans=0.0 2024-09-23 12:32:51,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=259112.0, ans=0.125 2024-09-23 12:32:57,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=259112.0, ans=0.0 2024-09-23 12:33:17,947 INFO [train.py:1198] (3/4) Epoch 15, batch 1000, loss[loss=0.2391, ctc_loss=0.1612, cr_loss=0.3897, over 17057.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1563, cr_loss=0.3708, over 3339412.86 frames. ], batch size: 46, lr: 8.11e-03, grad_scale: 16.0 2024-09-23 12:33:33,770 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.224e+02 1.330e+02 1.426e+02 2.141e+02, threshold=2.660e+02, percent-clipped=0.0 2024-09-23 12:33:34,371 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.60 vs. limit=10.0 2024-09-23 12:33:54,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=259298.66666666666, ans=0.125 2024-09-23 12:34:11,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=259345.33333333334, ans=0.1 2024-09-23 12:34:15,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=259345.33333333334, ans=0.125 2024-09-23 12:34:40,678 INFO [train.py:1198] (3/4) Epoch 15, batch 1050, loss[loss=0.1968, ctc_loss=0.1313, cr_loss=0.3274, over 17209.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1556, cr_loss=0.3693, over 3342529.38 frames. ], batch size: 47, lr: 8.10e-03, grad_scale: 16.0 2024-09-23 12:34:45,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=259438.66666666666, ans=0.125 2024-09-23 12:34:55,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=259485.33333333334, ans=0.125 2024-09-23 12:35:09,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=259485.33333333334, ans=0.0 2024-09-23 12:35:20,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=259532.0, ans=0.0 2024-09-23 12:35:24,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=259532.0, ans=0.125 2024-09-23 12:35:43,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=22.5 2024-09-23 12:35:45,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=259578.66666666666, ans=0.1 2024-09-23 12:35:50,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=259625.33333333334, ans=0.035 2024-09-23 12:36:05,846 INFO [train.py:1198] (3/4) Epoch 15, batch 1100, loss[loss=0.1972, ctc_loss=0.1313, cr_loss=0.3296, over 16286.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.155, cr_loss=0.3694, over 3348698.28 frames. ], batch size: 36, lr: 8.10e-03, grad_scale: 16.0 2024-09-23 12:36:21,571 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.305e+02 1.420e+02 1.545e+02 2.157e+02, threshold=2.840e+02, percent-clipped=0.0 2024-09-23 12:36:39,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=259765.33333333334, ans=0.025 2024-09-23 12:36:42,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=259765.33333333334, ans=0.125 2024-09-23 12:37:25,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=259858.66666666666, ans=0.2 2024-09-23 12:37:28,269 INFO [train.py:1198] (3/4) Epoch 15, batch 1150, loss[loss=0.2323, ctc_loss=0.1542, cr_loss=0.3904, over 17029.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.155, cr_loss=0.3693, over 3359429.71 frames. ], batch size: 51, lr: 8.10e-03, grad_scale: 16.0 2024-09-23 12:37:55,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=259952.0, ans=0.0 2024-09-23 12:38:08,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=259998.66666666666, ans=0.125 2024-09-23 12:38:48,575 INFO [train.py:1198] (3/4) Epoch 15, batch 1200, loss[loss=0.216, ctc_loss=0.1488, cr_loss=0.3361, over 16510.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1548, cr_loss=0.3686, over 3365689.24 frames. ], batch size: 66, lr: 8.09e-03, grad_scale: 32.0 2024-09-23 12:38:57,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.02 vs. limit=22.5 2024-09-23 12:39:04,649 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.045e+02 1.247e+02 1.362e+02 1.504e+02 2.311e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-23 12:39:12,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.09 vs. limit=10.0 2024-09-23 12:39:27,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=260232.0, ans=0.125 2024-09-23 12:39:32,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=260232.0, ans=0.025 2024-09-23 12:39:33,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=260232.0, ans=0.2 2024-09-23 12:40:13,699 INFO [train.py:1198] (3/4) Epoch 15, batch 1250, loss[loss=0.2349, ctc_loss=0.1607, cr_loss=0.371, over 17101.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1543, cr_loss=0.3683, over 3376240.85 frames. ], batch size: 49, lr: 8.09e-03, grad_scale: 32.0 2024-09-23 12:40:17,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.72 vs. limit=15.0 2024-09-23 12:41:02,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=260512.0, ans=0.0 2024-09-23 12:41:22,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=260558.66666666666, ans=0.1 2024-09-23 12:41:35,960 INFO [train.py:1198] (3/4) Epoch 15, batch 1300, loss[loss=0.2262, ctc_loss=0.1494, cr_loss=0.3838, over 17178.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1546, cr_loss=0.3694, over 3381141.39 frames. ], batch size: 45, lr: 8.09e-03, grad_scale: 16.0 2024-09-23 12:41:42,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2024-09-23 12:41:47,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=260605.33333333334, ans=0.2 2024-09-23 12:41:48,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=260605.33333333334, ans=0.5 2024-09-23 12:41:53,392 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.270e+02 1.373e+02 1.516e+02 2.157e+02, threshold=2.746e+02, percent-clipped=0.0 2024-09-23 12:42:09,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.18 vs. limit=15.0 2024-09-23 12:42:52,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=260792.0, ans=0.95 2024-09-23 12:42:52,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=260792.0, ans=0.0 2024-09-23 12:42:58,473 INFO [train.py:1198] (3/4) Epoch 15, batch 1350, loss[loss=0.24, ctc_loss=0.1621, cr_loss=0.3894, over 17296.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1546, cr_loss=0.3689, over 3369511.91 frames. ], batch size: 51, lr: 8.08e-03, grad_scale: 16.0 2024-09-23 12:43:16,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=260885.33333333334, ans=0.125 2024-09-23 12:43:18,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=260885.33333333334, ans=0.0 2024-09-23 12:43:18,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=260885.33333333334, ans=0.125 2024-09-23 12:44:06,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.24 vs. limit=15.0 2024-09-23 12:44:12,773 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 12:44:18,619 INFO [train.py:1198] (3/4) Epoch 15, batch 1400, loss[loss=0.2106, ctc_loss=0.1405, cr_loss=0.3506, over 17287.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1536, cr_loss=0.3672, over 3373453.50 frames. ], batch size: 42, lr: 8.08e-03, grad_scale: 16.0 2024-09-23 12:44:18,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=261072.0, ans=0.0 2024-09-23 12:44:27,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.91 vs. limit=15.0 2024-09-23 12:44:36,422 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.225e+02 1.306e+02 1.392e+02 2.119e+02, threshold=2.612e+02, percent-clipped=0.0 2024-09-23 12:44:36,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=261118.66666666666, ans=0.125 2024-09-23 12:44:36,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=261118.66666666666, ans=0.2 2024-09-23 12:45:15,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=261212.0, ans=0.125 2024-09-23 12:45:24,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=261212.0, ans=0.0 2024-09-23 12:45:36,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2024-09-23 12:45:46,012 INFO [train.py:1198] (3/4) Epoch 15, batch 1450, loss[loss=0.2443, ctc_loss=0.1653, cr_loss=0.3948, over 17151.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1535, cr_loss=0.3672, over 3383291.35 frames. ], batch size: 48, lr: 8.07e-03, grad_scale: 16.0 2024-09-23 12:45:49,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=261305.33333333334, ans=0.09899494936611666 2024-09-23 12:46:33,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=261398.66666666666, ans=0.0 2024-09-23 12:46:36,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=261445.33333333334, ans=0.0 2024-09-23 12:46:39,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=261445.33333333334, ans=0.0 2024-09-23 12:46:39,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=261445.33333333334, ans=0.125 2024-09-23 12:46:50,930 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 12:46:55,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=261492.0, ans=0.0 2024-09-23 12:47:05,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.67 vs. limit=6.0 2024-09-23 12:47:08,082 INFO [train.py:1198] (3/4) Epoch 15, batch 1500, loss[loss=0.256, ctc_loss=0.1735, cr_loss=0.4122, over 17359.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1538, cr_loss=0.3669, over 3376318.60 frames. ], batch size: 48, lr: 8.07e-03, grad_scale: 16.0 2024-09-23 12:47:11,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=261538.66666666666, ans=0.0 2024-09-23 12:47:21,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=12.0 2024-09-23 12:47:25,760 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.263e+02 1.374e+02 1.561e+02 5.695e+02, threshold=2.748e+02, percent-clipped=2.0 2024-09-23 12:47:30,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=261585.33333333334, ans=0.125 2024-09-23 12:47:44,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=261632.0, ans=0.125 2024-09-23 12:47:57,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=261678.66666666666, ans=0.125 2024-09-23 12:48:02,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2024-09-23 12:48:27,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=261725.33333333334, ans=0.1 2024-09-23 12:48:27,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=261725.33333333334, ans=0.125 2024-09-23 12:48:30,900 INFO [train.py:1198] (3/4) Epoch 15, batch 1550, loss[loss=0.1894, ctc_loss=0.1252, cr_loss=0.3212, over 17088.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1547, cr_loss=0.3685, over 3367581.46 frames. ], batch size: 43, lr: 8.07e-03, grad_scale: 16.0 2024-09-23 12:48:34,390 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 12:48:39,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=261772.0, ans=0.0 2024-09-23 12:48:59,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=261818.66666666666, ans=0.1 2024-09-23 12:49:28,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=261912.0, ans=0.0 2024-09-23 12:49:53,478 INFO [train.py:1198] (3/4) Epoch 15, batch 1600, loss[loss=0.2265, ctc_loss=0.1488, cr_loss=0.3886, over 17303.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1555, cr_loss=0.3694, over 3361639.83 frames. ], batch size: 49, lr: 8.06e-03, grad_scale: 32.0 2024-09-23 12:50:13,471 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.270e+02 1.410e+02 1.627e+02 2.274e+02, threshold=2.820e+02, percent-clipped=0.0 2024-09-23 12:50:15,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=262052.0, ans=0.0 2024-09-23 12:50:30,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=262098.66666666666, ans=0.0 2024-09-23 12:50:40,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=262098.66666666666, ans=10.0 2024-09-23 12:51:11,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=262192.0, ans=0.1 2024-09-23 12:51:17,822 INFO [train.py:1198] (3/4) Epoch 15, batch 1650, loss[loss=0.2207, ctc_loss=0.1459, cr_loss=0.3742, over 17093.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1556, cr_loss=0.3696, over 3363224.79 frames. ], batch size: 43, lr: 8.06e-03, grad_scale: 32.0 2024-09-23 12:51:26,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=262238.6666666667, ans=0.125 2024-09-23 12:51:31,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=262238.6666666667, ans=0.2 2024-09-23 12:51:48,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=262332.0, ans=0.025 2024-09-23 12:51:51,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.78 vs. limit=22.5 2024-09-23 12:52:10,988 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2024-09-23 12:52:16,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=262378.6666666667, ans=0.0 2024-09-23 12:52:36,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=262425.3333333333, ans=0.0 2024-09-23 12:52:39,798 INFO [train.py:1198] (3/4) Epoch 15, batch 1700, loss[loss=0.3032, ctc_loss=0.2193, cr_loss=0.4191, over 11744.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1565, cr_loss=0.3711, over 3362451.75 frames. ], batch size: 123, lr: 8.06e-03, grad_scale: 32.0 2024-09-23 12:52:57,187 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.272e+02 1.393e+02 1.539e+02 2.504e+02, threshold=2.785e+02, percent-clipped=0.0 2024-09-23 12:53:02,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=262518.6666666667, ans=0.125 2024-09-23 12:53:24,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=262565.3333333333, ans=0.0 2024-09-23 12:53:28,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=262612.0, ans=0.2 2024-09-23 12:53:58,955 INFO [train.py:1198] (3/4) Epoch 15, batch 1750, loss[loss=0.2246, ctc_loss=0.1481, cr_loss=0.3822, over 16725.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.156, cr_loss=0.371, over 3365029.72 frames. ], batch size: 61, lr: 8.05e-03, grad_scale: 32.0 2024-09-23 12:54:15,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2024-09-23 12:54:40,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=262798.6666666667, ans=0.2 2024-09-23 12:55:20,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=262892.0, ans=0.0 2024-09-23 12:55:26,238 INFO [train.py:1198] (3/4) Epoch 15, batch 1800, loss[loss=0.2185, ctc_loss=0.1478, cr_loss=0.3538, over 16347.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1558, cr_loss=0.371, over 3364251.61 frames. ], batch size: 36, lr: 8.05e-03, grad_scale: 32.0 2024-09-23 12:55:26,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=262938.6666666667, ans=0.0 2024-09-23 12:55:43,904 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.283e+02 1.353e+02 1.483e+02 2.243e+02, threshold=2.705e+02, percent-clipped=0.0 2024-09-23 12:56:03,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=263032.0, ans=0.0 2024-09-23 12:56:46,107 INFO [train.py:1198] (3/4) Epoch 15, batch 1850, loss[loss=0.2925, ctc_loss=0.212, cr_loss=0.402, over 12168.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1556, cr_loss=0.3698, over 3364889.16 frames. ], batch size: 123, lr: 8.05e-03, grad_scale: 32.0 2024-09-23 12:56:59,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=263172.0, ans=0.1 2024-09-23 12:57:01,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.43 vs. limit=5.0 2024-09-23 12:57:05,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=263218.6666666667, ans=0.1 2024-09-23 12:57:10,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=263218.6666666667, ans=0.125 2024-09-23 12:57:16,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=263265.3333333333, ans=0.2 2024-09-23 12:57:27,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=263265.3333333333, ans=0.0 2024-09-23 12:57:39,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=263312.0, ans=0.125 2024-09-23 12:57:40,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.28 vs. limit=15.0 2024-09-23 12:57:59,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=263358.6666666667, ans=0.2 2024-09-23 12:58:08,514 INFO [train.py:1198] (3/4) Epoch 15, batch 1900, loss[loss=0.2146, ctc_loss=0.1428, cr_loss=0.3591, over 17140.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1559, cr_loss=0.3698, over 3356072.89 frames. ], batch size: 48, lr: 8.04e-03, grad_scale: 32.0 2024-09-23 12:58:08,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=263405.3333333333, ans=0.0 2024-09-23 12:58:26,131 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.271e+02 1.385e+02 1.549e+02 2.232e+02, threshold=2.769e+02, percent-clipped=0.0 2024-09-23 12:58:35,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=263452.0, ans=0.2 2024-09-23 12:58:43,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=263498.6666666667, ans=0.125 2024-09-23 12:59:14,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=263592.0, ans=0.0 2024-09-23 12:59:15,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=263592.0, ans=0.1 2024-09-23 12:59:27,839 INFO [train.py:1198] (3/4) Epoch 15, batch 1950, loss[loss=0.2176, ctc_loss=0.145, cr_loss=0.3631, over 16954.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1556, cr_loss=0.3697, over 3360103.46 frames. ], batch size: 42, lr: 8.04e-03, grad_scale: 16.0 2024-09-23 12:59:58,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=263685.3333333333, ans=0.05 2024-09-23 12:59:58,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2024-09-23 13:00:05,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=263732.0, ans=0.0 2024-09-23 13:00:09,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2024-09-23 13:00:20,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=263732.0, ans=0.125 2024-09-23 13:00:34,789 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.53 vs. limit=22.5 2024-09-23 13:00:37,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=263825.3333333333, ans=0.125 2024-09-23 13:00:43,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=263825.3333333333, ans=0.05 2024-09-23 13:00:45,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=263825.3333333333, ans=0.025 2024-09-23 13:00:53,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=263872.0, ans=0.1 2024-09-23 13:00:54,761 INFO [train.py:1198] (3/4) Epoch 15, batch 2000, loss[loss=0.2093, ctc_loss=0.1394, cr_loss=0.3493, over 17169.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1557, cr_loss=0.3697, over 3361585.44 frames. ], batch size: 45, lr: 8.04e-03, grad_scale: 32.0 2024-09-23 13:01:08,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2024-09-23 13:01:11,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=263918.6666666667, ans=0.125 2024-09-23 13:01:14,002 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.286e+02 1.410e+02 1.590e+02 2.174e+02, threshold=2.819e+02, percent-clipped=0.0 2024-09-23 13:01:28,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=263965.3333333333, ans=0.125 2024-09-23 13:02:06,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=264058.6666666667, ans=0.025 2024-09-23 13:02:16,690 INFO [train.py:1198] (3/4) Epoch 15, batch 2050, loss[loss=0.2024, ctc_loss=0.1361, cr_loss=0.3316, over 17101.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1565, cr_loss=0.3717, over 3366268.46 frames. ], batch size: 40, lr: 8.03e-03, grad_scale: 32.0 2024-09-23 13:02:23,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=264105.3333333333, ans=0.125 2024-09-23 13:02:47,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=264198.6666666667, ans=0.125 2024-09-23 13:02:49,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264198.6666666667, ans=0.1 2024-09-23 13:03:03,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=264245.3333333333, ans=0.025 2024-09-23 13:03:11,802 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2024-09-23 13:03:14,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=264245.3333333333, ans=0.125 2024-09-23 13:03:17,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=264245.3333333333, ans=0.125 2024-09-23 13:03:22,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=264292.0, ans=0.125 2024-09-23 13:03:27,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=264292.0, ans=0.125 2024-09-23 13:03:36,270 INFO [train.py:1198] (3/4) Epoch 15, batch 2100, loss[loss=0.2224, ctc_loss=0.1511, cr_loss=0.3565, over 17251.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1556, cr_loss=0.3699, over 3373813.75 frames. ], batch size: 42, lr: 8.03e-03, grad_scale: 32.0 2024-09-23 13:03:55,214 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.282e+02 1.387e+02 1.573e+02 2.128e+02, threshold=2.775e+02, percent-clipped=0.0 2024-09-23 13:04:27,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=264478.6666666667, ans=0.125 2024-09-23 13:04:44,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=264525.3333333333, ans=0.125 2024-09-23 13:05:00,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=264525.3333333333, ans=0.0 2024-09-23 13:05:03,755 INFO [train.py:1198] (3/4) Epoch 15, batch 2150, loss[loss=0.2295, ctc_loss=0.1538, cr_loss=0.3784, over 17230.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1551, cr_loss=0.3693, over 3375834.46 frames. ], batch size: 44, lr: 8.03e-03, grad_scale: 32.0 2024-09-23 13:05:16,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=264572.0, ans=0.0 2024-09-23 13:05:16,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=264572.0, ans=0.04949747468305833 2024-09-23 13:05:18,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=264618.6666666667, ans=0.2 2024-09-23 13:05:19,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2024-09-23 13:05:25,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.29 vs. limit=15.0 2024-09-23 13:05:26,986 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.31 vs. limit=22.5 2024-09-23 13:05:35,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=264665.3333333333, ans=0.025 2024-09-23 13:05:37,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=264665.3333333333, ans=0.025 2024-09-23 13:05:40,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=264665.3333333333, ans=0.07 2024-09-23 13:05:53,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=264712.0, ans=0.125 2024-09-23 13:05:53,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=264712.0, ans=0.0 2024-09-23 13:05:58,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=22.5 2024-09-23 13:06:13,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2024-09-23 13:06:23,716 INFO [train.py:1198] (3/4) Epoch 15, batch 2200, loss[loss=0.1847, ctc_loss=0.1229, cr_loss=0.3088, over 17144.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1543, cr_loss=0.3687, over 3380545.57 frames. ], batch size: 40, lr: 8.02e-03, grad_scale: 32.0 2024-09-23 13:06:30,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=264805.3333333333, ans=0.125 2024-09-23 13:06:31,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=264805.3333333333, ans=0.2 2024-09-23 13:06:42,813 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.291e+02 1.443e+02 1.638e+02 2.419e+02, threshold=2.885e+02, percent-clipped=0.0 2024-09-23 13:06:44,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=264852.0, ans=0.0 2024-09-23 13:07:10,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=264945.3333333333, ans=0.125 2024-09-23 13:07:17,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=264945.3333333333, ans=0.125 2024-09-23 13:07:46,504 INFO [train.py:1198] (3/4) Epoch 15, batch 2250, loss[loss=0.2354, ctc_loss=0.1612, cr_loss=0.371, over 17059.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1545, cr_loss=0.3686, over 3379681.96 frames. ], batch size: 46, lr: 8.02e-03, grad_scale: 32.0 2024-09-23 13:07:54,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=265038.6666666667, ans=0.0 2024-09-23 13:07:55,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.65 vs. limit=15.0 2024-09-23 13:08:00,533 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.46 vs. limit=10.0 2024-09-23 13:08:22,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=265132.0, ans=0.1 2024-09-23 13:08:41,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=265178.6666666667, ans=10.0 2024-09-23 13:08:54,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=265225.3333333333, ans=0.125 2024-09-23 13:09:06,704 INFO [train.py:1198] (3/4) Epoch 15, batch 2300, loss[loss=0.2611, ctc_loss=0.1784, cr_loss=0.4132, over 17240.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.155, cr_loss=0.3697, over 3374447.57 frames. ], batch size: 55, lr: 8.02e-03, grad_scale: 16.0 2024-09-23 13:09:27,202 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.234e+02 1.309e+02 1.490e+02 3.155e+02, threshold=2.619e+02, percent-clipped=1.0 2024-09-23 13:09:29,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=265318.6666666667, ans=0.2 2024-09-23 13:10:17,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=265458.6666666667, ans=0.05 2024-09-23 13:10:28,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=265458.6666666667, ans=0.125 2024-09-23 13:10:34,410 INFO [train.py:1198] (3/4) Epoch 15, batch 2350, loss[loss=0.2635, ctc_loss=0.1786, cr_loss=0.4246, over 16554.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1544, cr_loss=0.3694, over 3376624.43 frames. ], batch size: 66, lr: 8.01e-03, grad_scale: 16.0 2024-09-23 13:10:36,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.07 vs. limit=10.0 2024-09-23 13:11:07,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=265598.6666666667, ans=0.125 2024-09-23 13:11:51,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=12.0 2024-09-23 13:11:53,676 INFO [train.py:1198] (3/4) Epoch 15, batch 2400, loss[loss=0.2786, ctc_loss=0.2014, cr_loss=0.386, over 11525.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1554, cr_loss=0.3704, over 3363906.45 frames. ], batch size: 123, lr: 8.01e-03, grad_scale: 16.0 2024-09-23 13:12:06,114 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=22.5 2024-09-23 13:12:17,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=265785.3333333333, ans=0.125 2024-09-23 13:12:18,951 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.338e+02 1.450e+02 1.574e+02 3.453e+02, threshold=2.900e+02, percent-clipped=1.0 2024-09-23 13:12:22,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=265785.3333333333, ans=0.0 2024-09-23 13:13:13,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=265925.3333333333, ans=0.125 2024-09-23 13:13:16,578 INFO [train.py:1198] (3/4) Epoch 15, batch 2450, loss[loss=0.2148, ctc_loss=0.1427, cr_loss=0.3604, over 17094.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1553, cr_loss=0.3698, over 3355867.06 frames. ], batch size: 43, lr: 8.00e-03, grad_scale: 16.0 2024-09-23 13:14:17,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=266112.0, ans=0.015 2024-09-23 13:14:38,833 INFO [train.py:1198] (3/4) Epoch 15, batch 2500, loss[loss=0.2204, ctc_loss=0.1444, cr_loss=0.3798, over 17023.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1553, cr_loss=0.369, over 3339013.89 frames. ], batch size: 44, lr: 8.00e-03, grad_scale: 16.0 2024-09-23 13:14:39,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.98 vs. limit=22.5 2024-09-23 13:14:59,894 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 13:15:03,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2024-09-23 13:15:05,875 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.240e+02 1.331e+02 1.479e+02 2.623e+02, threshold=2.663e+02, percent-clipped=0.0 2024-09-23 13:15:27,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-09-23 13:15:31,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=266345.3333333333, ans=0.1 2024-09-23 13:16:03,518 INFO [train.py:1198] (3/4) Epoch 15, batch 2550, loss[loss=0.2552, ctc_loss=0.1774, cr_loss=0.389, over 16788.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1559, cr_loss=0.3697, over 3345806.76 frames. ], batch size: 61, lr: 8.00e-03, grad_scale: 16.0 2024-09-23 13:16:08,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=266438.6666666667, ans=0.2 2024-09-23 13:16:43,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=266532.0, ans=0.0 2024-09-23 13:16:45,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=266532.0, ans=0.0 2024-09-23 13:17:15,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=266625.3333333333, ans=0.125 2024-09-23 13:17:25,064 INFO [train.py:1198] (3/4) Epoch 15, batch 2600, loss[loss=0.2238, ctc_loss=0.1523, cr_loss=0.3573, over 17155.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1551, cr_loss=0.3689, over 3353663.06 frames. ], batch size: 48, lr: 7.99e-03, grad_scale: 16.0 2024-09-23 13:17:39,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=266718.6666666667, ans=0.125 2024-09-23 13:17:47,325 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.291e+02 1.396e+02 1.519e+02 2.239e+02, threshold=2.791e+02, percent-clipped=0.0 2024-09-23 13:17:54,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=15.0 2024-09-23 13:18:22,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=266812.0, ans=0.125 2024-09-23 13:18:44,567 INFO [train.py:1198] (3/4) Epoch 15, batch 2650, loss[loss=0.2492, ctc_loss=0.1671, cr_loss=0.4106, over 17058.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1564, cr_loss=0.3706, over 3344294.96 frames. ], batch size: 52, lr: 7.99e-03, grad_scale: 16.0 2024-09-23 13:18:47,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=22.5 2024-09-23 13:18:51,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=266905.3333333333, ans=0.0 2024-09-23 13:19:09,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=266952.0, ans=0.125 2024-09-23 13:19:48,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=267045.3333333333, ans=0.0 2024-09-23 13:20:12,067 INFO [train.py:1198] (3/4) Epoch 15, batch 2700, loss[loss=0.2595, ctc_loss=0.1742, cr_loss=0.4267, over 17310.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1564, cr_loss=0.3712, over 3345236.42 frames. ], batch size: 51, lr: 7.99e-03, grad_scale: 16.0 2024-09-23 13:20:29,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=267185.3333333333, ans=0.0 2024-09-23 13:20:31,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2024-09-23 13:20:34,472 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.269e+02 1.346e+02 1.479e+02 2.065e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-23 13:20:49,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=267232.0, ans=0.2 2024-09-23 13:21:31,861 INFO [train.py:1198] (3/4) Epoch 15, batch 2750, loss[loss=0.3046, ctc_loss=0.2226, cr_loss=0.41, over 11859.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1554, cr_loss=0.3704, over 3350400.65 frames. ], batch size: 123, lr: 7.98e-03, grad_scale: 16.0 2024-09-23 13:21:35,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=267372.0, ans=0.125 2024-09-23 13:21:35,578 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 13:21:45,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267372.0, ans=0.1 2024-09-23 13:21:48,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=267418.6666666667, ans=0.0 2024-09-23 13:21:50,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-09-23 13:22:14,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=267465.3333333333, ans=0.125 2024-09-23 13:22:15,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=267465.3333333333, ans=0.015 2024-09-23 13:22:39,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=267558.6666666667, ans=0.04949747468305833 2024-09-23 13:22:53,943 INFO [train.py:1198] (3/4) Epoch 15, batch 2800, loss[loss=0.2287, ctc_loss=0.1525, cr_loss=0.381, over 17301.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1547, cr_loss=0.3703, over 3360425.69 frames. ], batch size: 51, lr: 7.98e-03, grad_scale: 32.0 2024-09-23 13:23:07,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2024-09-23 13:23:17,755 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.278e+02 1.476e+02 1.758e+02 2.429e+02, threshold=2.952e+02, percent-clipped=0.0 2024-09-23 13:23:18,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=267652.0, ans=0.0 2024-09-23 13:23:56,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=267792.0, ans=0.125 2024-09-23 13:24:07,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=267792.0, ans=0.0 2024-09-23 13:24:13,375 INFO [train.py:1198] (3/4) Epoch 15, batch 2850, loss[loss=0.2522, ctc_loss=0.1697, cr_loss=0.4124, over 17216.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1544, cr_loss=0.3698, over 3366752.82 frames. ], batch size: 55, lr: 7.98e-03, grad_scale: 16.0 2024-09-23 13:24:20,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=267838.6666666667, ans=0.125 2024-09-23 13:24:59,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.03 vs. limit=10.0 2024-09-23 13:25:17,934 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 13:25:41,431 INFO [train.py:1198] (3/4) Epoch 15, batch 2900, loss[loss=0.1916, ctc_loss=0.1251, cr_loss=0.3325, over 17179.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1535, cr_loss=0.3678, over 3372064.33 frames. ], batch size: 41, lr: 7.97e-03, grad_scale: 16.0 2024-09-23 13:25:56,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=268118.6666666667, ans=0.0 2024-09-23 13:25:59,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=268118.6666666667, ans=0.125 2024-09-23 13:26:05,969 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.264e+02 1.372e+02 1.552e+02 2.806e+02, threshold=2.744e+02, percent-clipped=0.0 2024-09-23 13:26:22,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268165.3333333333, ans=0.1 2024-09-23 13:26:36,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=268212.0, ans=0.0 2024-09-23 13:26:38,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=268212.0, ans=0.0 2024-09-23 13:27:03,975 INFO [train.py:1198] (3/4) Epoch 15, batch 2950, loss[loss=0.2725, ctc_loss=0.1877, cr_loss=0.4244, over 16914.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1529, cr_loss=0.3666, over 3377278.85 frames. ], batch size: 58, lr: 7.97e-03, grad_scale: 16.0 2024-09-23 13:27:07,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=268305.3333333333, ans=0.025 2024-09-23 13:27:12,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=268305.3333333333, ans=0.125 2024-09-23 13:27:32,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=268352.0, ans=0.125 2024-09-23 13:27:37,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=268398.6666666667, ans=0.125 2024-09-23 13:27:39,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=268398.6666666667, ans=0.125 2024-09-23 13:28:23,348 INFO [train.py:1198] (3/4) Epoch 15, batch 3000, loss[loss=0.2254, ctc_loss=0.1536, cr_loss=0.3591, over 17313.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1534, cr_loss=0.3671, over 3365051.93 frames. ], batch size: 51, lr: 7.97e-03, grad_scale: 16.0 2024-09-23 13:28:23,349 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 13:28:32,514 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1175, 3.5227, 3.4302, 4.1203, 3.3561, 3.3471, 4.0108, 4.3124], device='cuda:3') 2024-09-23 13:28:38,969 INFO [train.py:1230] (3/4) Epoch 15, validation: loss=0.04166, ctc_loss=0.04166, cr_loss=7.464e-15, over 944034.00 frames. 2024-09-23 13:28:38,970 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 13:28:45,709 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 13:29:02,495 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.290e+02 1.376e+02 1.476e+02 2.234e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-23 13:29:11,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=268632.0, ans=0.125 2024-09-23 13:29:22,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=268632.0, ans=0.125 2024-09-23 13:29:50,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.39 vs. limit=15.0 2024-09-23 13:29:56,911 INFO [train.py:1198] (3/4) Epoch 15, batch 3050, loss[loss=0.2337, ctc_loss=0.1598, cr_loss=0.3694, over 17224.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1537, cr_loss=0.3675, over 3360641.27 frames. ], batch size: 55, lr: 7.96e-03, grad_scale: 16.0 2024-09-23 13:30:36,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=268865.3333333333, ans=0.1 2024-09-23 13:31:14,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.39 vs. limit=12.0 2024-09-23 13:31:15,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=268958.6666666667, ans=0.125 2024-09-23 13:31:22,607 INFO [train.py:1198] (3/4) Epoch 15, batch 3100, loss[loss=0.2518, ctc_loss=0.1747, cr_loss=0.3853, over 12075.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1538, cr_loss=0.3675, over 3350402.40 frames. ], batch size: 123, lr: 7.96e-03, grad_scale: 16.0 2024-09-23 13:31:33,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=269005.3333333333, ans=0.0 2024-09-23 13:31:46,077 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.323e+02 1.397e+02 1.534e+02 2.167e+02, threshold=2.795e+02, percent-clipped=0.0 2024-09-23 13:32:00,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=269098.6666666667, ans=0.1 2024-09-23 13:32:01,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2024-09-23 13:32:02,771 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.00 vs. limit=10.0 2024-09-23 13:32:14,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=269145.3333333333, ans=0.0 2024-09-23 13:32:19,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=269145.3333333333, ans=0.1 2024-09-23 13:32:38,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=15.0 2024-09-23 13:32:41,150 INFO [train.py:1198] (3/4) Epoch 15, batch 3150, loss[loss=0.2464, ctc_loss=0.1676, cr_loss=0.3937, over 17000.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1544, cr_loss=0.3682, over 3345174.89 frames. ], batch size: 51, lr: 7.96e-03, grad_scale: 16.0 2024-09-23 13:32:50,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=269238.6666666667, ans=0.0 2024-09-23 13:32:54,251 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.85 vs. limit=10.0 2024-09-23 13:33:26,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=269378.6666666667, ans=0.0 2024-09-23 13:33:29,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=269378.6666666667, ans=0.025 2024-09-23 13:33:32,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=269378.6666666667, ans=0.0 2024-09-23 13:33:37,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=269378.6666666667, ans=0.2 2024-09-23 13:33:59,103 INFO [train.py:1198] (3/4) Epoch 15, batch 3200, loss[loss=0.2318, ctc_loss=0.1598, cr_loss=0.3603, over 17177.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1543, cr_loss=0.3684, over 3345021.79 frames. ], batch size: 45, lr: 7.95e-03, grad_scale: 32.0 2024-09-23 13:34:10,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=269472.0, ans=0.0 2024-09-23 13:34:19,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=269518.6666666667, ans=0.05 2024-09-23 13:34:22,377 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.268e+02 1.359e+02 1.495e+02 2.696e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-23 13:34:22,660 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 13:34:46,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=269612.0, ans=0.125 2024-09-23 13:34:48,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2024-09-23 13:34:56,046 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.75 vs. limit=15.0 2024-09-23 13:35:00,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=269658.6666666667, ans=0.125 2024-09-23 13:35:01,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=269658.6666666667, ans=0.125 2024-09-23 13:35:05,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2024-09-23 13:35:17,026 INFO [train.py:1198] (3/4) Epoch 15, batch 3250, loss[loss=0.2411, ctc_loss=0.1619, cr_loss=0.396, over 17065.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1546, cr_loss=0.3689, over 3350884.79 frames. ], batch size: 46, lr: 7.95e-03, grad_scale: 32.0 2024-09-23 13:35:17,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=269705.3333333333, ans=0.035 2024-09-23 13:35:19,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=12.0 2024-09-23 13:35:27,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=15.0 2024-09-23 13:35:49,489 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=22.5 2024-09-23 13:35:57,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=269798.6666666667, ans=0.125 2024-09-23 13:36:03,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=269798.6666666667, ans=0.125 2024-09-23 13:36:13,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=269845.3333333333, ans=0.0 2024-09-23 13:36:27,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=269892.0, ans=0.125 2024-09-23 13:36:35,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=269892.0, ans=0.09899494936611666 2024-09-23 13:36:37,981 INFO [train.py:1198] (3/4) Epoch 15, batch 3300, loss[loss=0.2465, ctc_loss=0.1686, cr_loss=0.3895, over 16908.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1551, cr_loss=0.3688, over 3341794.37 frames. ], batch size: 58, lr: 7.95e-03, grad_scale: 32.0 2024-09-23 13:36:38,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=269938.6666666667, ans=0.125 2024-09-23 13:36:38,843 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.71 vs. limit=22.5 2024-09-23 13:36:47,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=269938.6666666667, ans=0.0 2024-09-23 13:36:49,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=269938.6666666667, ans=0.0 2024-09-23 13:37:01,539 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.277e+02 1.362e+02 1.536e+02 4.994e+02, threshold=2.723e+02, percent-clipped=1.0 2024-09-23 13:37:08,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=270032.0, ans=0.125 2024-09-23 13:37:20,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=270032.0, ans=0.125 2024-09-23 13:37:23,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=270078.6666666667, ans=0.0 2024-09-23 13:37:24,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=270078.6666666667, ans=0.0 2024-09-23 13:37:25,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=270078.6666666667, ans=0.0 2024-09-23 13:37:37,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=270078.6666666667, ans=0.125 2024-09-23 13:37:55,872 INFO [train.py:1198] (3/4) Epoch 15, batch 3350, loss[loss=0.2334, ctc_loss=0.1604, cr_loss=0.365, over 17298.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.155, cr_loss=0.3692, over 3351044.32 frames. ], batch size: 51, lr: 7.94e-03, grad_scale: 32.0 2024-09-23 13:37:57,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=270172.0, ans=0.2 2024-09-23 13:38:09,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=270218.6666666667, ans=0.125 2024-09-23 13:38:16,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=270218.6666666667, ans=0.0 2024-09-23 13:38:16,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=270218.6666666667, ans=0.0 2024-09-23 13:38:37,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.16 vs. limit=22.5 2024-09-23 13:39:00,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=270358.6666666667, ans=0.2 2024-09-23 13:39:12,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=270405.3333333333, ans=0.125 2024-09-23 13:39:14,023 INFO [train.py:1198] (3/4) Epoch 15, batch 3400, loss[loss=0.2227, ctc_loss=0.1522, cr_loss=0.3525, over 17314.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1535, cr_loss=0.3675, over 3360383.75 frames. ], batch size: 51, lr: 7.94e-03, grad_scale: 32.0 2024-09-23 13:39:15,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=270405.3333333333, ans=0.125 2024-09-23 13:39:36,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-09-23 13:39:37,360 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.273e+02 1.371e+02 1.521e+02 2.083e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-23 13:39:54,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=270498.6666666667, ans=0.125 2024-09-23 13:40:04,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=270545.3333333333, ans=0.125 2024-09-23 13:40:04,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=270545.3333333333, ans=0.2 2024-09-23 13:40:23,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=270592.0, ans=0.125 2024-09-23 13:40:26,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=270592.0, ans=0.05 2024-09-23 13:40:32,271 INFO [train.py:1198] (3/4) Epoch 15, batch 3450, loss[loss=0.249, ctc_loss=0.1671, cr_loss=0.4094, over 17345.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1538, cr_loss=0.368, over 3362886.65 frames. ], batch size: 48, lr: 7.94e-03, grad_scale: 32.0 2024-09-23 13:40:40,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=270638.6666666667, ans=0.05 2024-09-23 13:40:42,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=270638.6666666667, ans=0.125 2024-09-23 13:41:00,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=270685.3333333333, ans=0.07 2024-09-23 13:41:16,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=270732.0, ans=0.125 2024-09-23 13:41:32,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270778.6666666667, ans=0.1 2024-09-23 13:41:32,935 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2024-09-23 13:41:39,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=270825.3333333333, ans=0.1 2024-09-23 13:41:56,577 INFO [train.py:1198] (3/4) Epoch 15, batch 3500, loss[loss=0.1862, ctc_loss=0.1227, cr_loss=0.3174, over 16684.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1534, cr_loss=0.3676, over 3366733.20 frames. ], batch size: 37, lr: 7.93e-03, grad_scale: 32.0 2024-09-23 13:42:18,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=270918.6666666667, ans=0.0 2024-09-23 13:42:19,966 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.027e+02 1.268e+02 1.406e+02 1.575e+02 2.426e+02, threshold=2.811e+02, percent-clipped=0.0 2024-09-23 13:42:21,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=270918.6666666667, ans=0.125 2024-09-23 13:43:06,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=271058.6666666667, ans=0.125 2024-09-23 13:43:08,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.09 vs. limit=15.0 2024-09-23 13:43:15,426 INFO [train.py:1198] (3/4) Epoch 15, batch 3550, loss[loss=0.259, ctc_loss=0.1814, cr_loss=0.388, over 16065.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1532, cr_loss=0.3676, over 3370255.18 frames. ], batch size: 74, lr: 7.93e-03, grad_scale: 32.0 2024-09-23 13:43:53,901 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2024-09-23 13:44:01,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=271245.3333333333, ans=0.025 2024-09-23 13:44:07,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=271245.3333333333, ans=0.125 2024-09-23 13:44:08,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=271245.3333333333, ans=0.125 2024-09-23 13:44:09,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2024-09-23 13:44:33,801 INFO [train.py:1198] (3/4) Epoch 15, batch 3600, loss[loss=0.21, ctc_loss=0.1426, cr_loss=0.3371, over 16690.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1523, cr_loss=0.3665, over 3369634.60 frames. ], batch size: 37, lr: 7.93e-03, grad_scale: 32.0 2024-09-23 13:44:35,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=271338.6666666667, ans=0.5 2024-09-23 13:44:57,246 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.288e+02 1.401e+02 1.561e+02 2.194e+02, threshold=2.802e+02, percent-clipped=0.0 2024-09-23 13:45:18,797 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=12.0 2024-09-23 13:45:20,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2024-09-23 13:45:22,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=271478.6666666667, ans=0.0 2024-09-23 13:45:22,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=12.0 2024-09-23 13:45:41,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=271525.3333333333, ans=0.125 2024-09-23 13:45:43,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=271525.3333333333, ans=0.07 2024-09-23 13:45:50,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=271525.3333333333, ans=0.0 2024-09-23 13:45:53,865 INFO [train.py:1198] (3/4) Epoch 15, batch 3650, loss[loss=0.2352, ctc_loss=0.16, cr_loss=0.3757, over 17213.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1536, cr_loss=0.3681, over 3366530.81 frames. ], batch size: 50, lr: 7.92e-03, grad_scale: 16.0 2024-09-23 13:46:03,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=271572.0, ans=0.0 2024-09-23 13:46:20,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=271618.6666666667, ans=0.025 2024-09-23 13:46:25,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=271665.3333333333, ans=0.0 2024-09-23 13:46:34,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-09-23 13:46:35,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=271665.3333333333, ans=0.0 2024-09-23 13:46:43,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=271712.0, ans=0.125 2024-09-23 13:47:00,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=271758.6666666667, ans=0.09899494936611666 2024-09-23 13:47:10,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=22.5 2024-09-23 13:47:12,694 INFO [train.py:1198] (3/4) Epoch 15, batch 3700, loss[loss=0.2305, ctc_loss=0.159, cr_loss=0.3577, over 16986.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1538, cr_loss=0.3679, over 3368127.42 frames. ], batch size: 51, lr: 7.92e-03, grad_scale: 16.0 2024-09-23 13:47:20,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=271805.3333333333, ans=0.07 2024-09-23 13:47:37,539 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.268e+02 1.357e+02 1.522e+02 2.745e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-23 13:47:57,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=271898.6666666667, ans=0.125 2024-09-23 13:48:31,512 INFO [train.py:1198] (3/4) Epoch 15, batch 3750, loss[loss=0.2844, ctc_loss=0.2033, cr_loss=0.4054, over 11441.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1555, cr_loss=0.3705, over 3359958.98 frames. ], batch size: 124, lr: 7.92e-03, grad_scale: 16.0 2024-09-23 13:48:46,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=272085.3333333333, ans=0.125 2024-09-23 13:48:57,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=272085.3333333333, ans=0.125 2024-09-23 13:49:12,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=272132.0, ans=0.0 2024-09-23 13:49:17,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=272178.6666666667, ans=0.125 2024-09-23 13:49:17,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=272178.6666666667, ans=0.0 2024-09-23 13:49:19,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=272178.6666666667, ans=0.125 2024-09-23 13:49:44,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=272225.3333333333, ans=0.0 2024-09-23 13:49:48,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.14 vs. limit=22.5 2024-09-23 13:49:50,635 INFO [train.py:1198] (3/4) Epoch 15, batch 3800, loss[loss=0.1968, ctc_loss=0.1282, cr_loss=0.343, over 16930.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1547, cr_loss=0.3697, over 3348273.85 frames. ], batch size: 42, lr: 7.91e-03, grad_scale: 16.0 2024-09-23 13:49:57,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=272272.0, ans=0.125 2024-09-23 13:50:06,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=272318.6666666667, ans=0.2 2024-09-23 13:50:11,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=272318.6666666667, ans=0.0 2024-09-23 13:50:14,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=272318.6666666667, ans=0.2 2024-09-23 13:50:16,162 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.263e+02 1.390e+02 1.534e+02 1.887e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-23 13:50:25,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=272365.3333333333, ans=0.0 2024-09-23 13:50:30,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=272365.3333333333, ans=0.1 2024-09-23 13:50:34,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=272365.3333333333, ans=0.0 2024-09-23 13:50:34,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=27.18 vs. limit=22.5 2024-09-23 13:51:10,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=272505.3333333333, ans=0.05 2024-09-23 13:51:11,676 INFO [train.py:1198] (3/4) Epoch 15, batch 3850, loss[loss=0.2627, ctc_loss=0.1794, cr_loss=0.4165, over 14891.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1564, cr_loss=0.3718, over 3309958.91 frames. ], batch size: 89, lr: 7.91e-03, grad_scale: 16.0 2024-09-23 13:51:21,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=272505.3333333333, ans=0.125 2024-09-23 13:52:08,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=272645.3333333333, ans=0.125 2024-09-23 13:53:14,135 INFO [train.py:1198] (3/4) Epoch 16, batch 0, loss[loss=0.2004, ctc_loss=0.1348, cr_loss=0.3282, over 17109.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1348, cr_loss=0.3282, over 17109.00 frames. ], batch size: 43, lr: 7.65e-03, grad_scale: 32.0 2024-09-23 13:53:14,135 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 13:53:30,198 INFO [train.py:1230] (3/4) Epoch 16, validation: loss=0.04222, ctc_loss=0.04222, cr_loss=7.738e-15, over 944034.00 frames. 2024-09-23 13:53:30,198 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 13:53:33,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.35 vs. limit=8.0 2024-09-23 13:53:37,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=22.5 2024-09-23 13:53:40,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=272720.0, ans=0.0 2024-09-23 13:53:41,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=272720.0, ans=0.125 2024-09-23 13:53:44,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=272766.6666666667, ans=0.125 2024-09-23 13:54:02,002 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.456e+02 1.611e+02 1.770e+02 2.340e+02, threshold=3.223e+02, percent-clipped=0.0 2024-09-23 13:54:03,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.86 vs. limit=22.5 2024-09-23 13:54:50,286 INFO [train.py:1198] (3/4) Epoch 16, batch 50, loss[loss=0.2158, ctc_loss=0.1409, cr_loss=0.3746, over 16712.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1589, cr_loss=0.3763, over 754351.00 frames. ], batch size: 37, lr: 7.65e-03, grad_scale: 32.0 2024-09-23 13:55:33,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=273046.6666666667, ans=0.125 2024-09-23 13:56:06,963 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 13:56:09,767 INFO [train.py:1198] (3/4) Epoch 16, batch 100, loss[loss=0.2505, ctc_loss=0.1702, cr_loss=0.4014, over 16612.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1564, cr_loss=0.3732, over 1337123.52 frames. ], batch size: 66, lr: 7.65e-03, grad_scale: 32.0 2024-09-23 13:56:09,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=273186.6666666667, ans=0.125 2024-09-23 13:56:21,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=273186.6666666667, ans=0.125 2024-09-23 13:56:41,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2024-09-23 13:56:51,568 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.239e+02 1.320e+02 1.510e+02 1.777e+02, threshold=2.639e+02, percent-clipped=0.0 2024-09-23 13:57:13,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=273326.6666666667, ans=0.0 2024-09-23 13:57:17,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=273326.6666666667, ans=0.0 2024-09-23 13:57:37,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.32 vs. limit=8.0 2024-09-23 13:57:38,813 INFO [train.py:1198] (3/4) Epoch 16, batch 150, loss[loss=0.2243, ctc_loss=0.1491, cr_loss=0.3759, over 17098.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1549, cr_loss=0.3714, over 1792789.13 frames. ], batch size: 49, lr: 7.64e-03, grad_scale: 32.0 2024-09-23 13:57:53,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=273466.6666666667, ans=0.0 2024-09-23 13:58:03,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=273466.6666666667, ans=0.125 2024-09-23 13:58:14,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=273513.3333333333, ans=0.0 2024-09-23 13:58:20,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=273513.3333333333, ans=0.125 2024-09-23 13:58:39,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.41 vs. limit=15.0 2024-09-23 13:58:45,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.73 vs. limit=22.5 2024-09-23 13:58:49,252 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.35 vs. limit=15.0 2024-09-23 13:58:53,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=273606.6666666667, ans=0.125 2024-09-23 13:58:59,126 INFO [train.py:1198] (3/4) Epoch 16, batch 200, loss[loss=0.2743, ctc_loss=0.2009, cr_loss=0.3666, over 11700.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1546, cr_loss=0.3696, over 2138679.60 frames. ], batch size: 123, lr: 7.64e-03, grad_scale: 32.0 2024-09-23 13:59:23,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=273700.0, ans=0.125 2024-09-23 13:59:28,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=273700.0, ans=0.125 2024-09-23 13:59:31,012 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.259e+02 1.374e+02 1.473e+02 2.348e+02, threshold=2.748e+02, percent-clipped=0.0 2024-09-23 13:59:44,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=273746.6666666667, ans=0.025 2024-09-23 13:59:45,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=273793.3333333333, ans=0.125 2024-09-23 14:00:03,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=273840.0, ans=0.1 2024-09-23 14:00:16,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=273840.0, ans=0.125 2024-09-23 14:00:18,770 INFO [train.py:1198] (3/4) Epoch 16, batch 250, loss[loss=0.2048, ctc_loss=0.1341, cr_loss=0.3536, over 17332.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1532, cr_loss=0.3672, over 2400817.04 frames. ], batch size: 43, lr: 7.64e-03, grad_scale: 32.0 2024-09-23 14:00:20,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=273886.6666666667, ans=0.1 2024-09-23 14:00:34,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=273933.3333333333, ans=0.125 2024-09-23 14:01:11,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=274026.6666666667, ans=0.125 2024-09-23 14:01:24,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=274026.6666666667, ans=0.0 2024-09-23 14:01:46,159 INFO [train.py:1198] (3/4) Epoch 16, batch 300, loss[loss=0.1766, ctc_loss=0.1161, cr_loss=0.3025, over 17261.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.154, cr_loss=0.3679, over 2604720.62 frames. ], batch size: 42, lr: 7.63e-03, grad_scale: 32.0 2024-09-23 14:02:22,714 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.271e+02 1.353e+02 1.525e+02 2.781e+02, threshold=2.705e+02, percent-clipped=1.0 2024-09-23 14:02:48,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=274260.0, ans=0.0 2024-09-23 14:03:01,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=274306.6666666667, ans=0.2 2024-09-23 14:03:08,655 INFO [train.py:1198] (3/4) Epoch 16, batch 350, loss[loss=0.2332, ctc_loss=0.1563, cr_loss=0.3846, over 17200.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1532, cr_loss=0.3664, over 2775881.36 frames. ], batch size: 47, lr: 7.63e-03, grad_scale: 16.0 2024-09-23 14:03:18,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=274353.3333333333, ans=0.5 2024-09-23 14:03:23,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=274400.0, ans=0.09899494936611666 2024-09-23 14:03:31,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=274400.0, ans=0.125 2024-09-23 14:03:42,571 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:03:44,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=274446.6666666667, ans=0.0 2024-09-23 14:03:55,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=274493.3333333333, ans=0.1 2024-09-23 14:04:28,535 INFO [train.py:1198] (3/4) Epoch 16, batch 400, loss[loss=0.2505, ctc_loss=0.17, cr_loss=0.4028, over 16674.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1541, cr_loss=0.3683, over 2908293.83 frames. ], batch size: 61, lr: 7.63e-03, grad_scale: 32.0 2024-09-23 14:04:56,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2024-09-23 14:05:01,857 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.271e+02 1.344e+02 1.500e+02 2.841e+02, threshold=2.689e+02, percent-clipped=1.0 2024-09-23 14:05:25,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=15.0 2024-09-23 14:05:47,971 INFO [train.py:1198] (3/4) Epoch 16, batch 450, loss[loss=0.2351, ctc_loss=0.1623, cr_loss=0.3643, over 17061.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1537, cr_loss=0.367, over 3003928.40 frames. ], batch size: 46, lr: 7.62e-03, grad_scale: 32.0 2024-09-23 14:05:57,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=274820.0, ans=0.1 2024-09-23 14:06:25,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=274913.3333333333, ans=0.125 2024-09-23 14:06:39,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=274960.0, ans=0.125 2024-09-23 14:06:43,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=12.0 2024-09-23 14:07:15,705 INFO [train.py:1198] (3/4) Epoch 16, batch 500, loss[loss=0.2486, ctc_loss=0.1687, cr_loss=0.3994, over 16502.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1543, cr_loss=0.369, over 3082255.35 frames. ], batch size: 66, lr: 7.62e-03, grad_scale: 16.0 2024-09-23 14:07:17,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=275053.3333333333, ans=0.0 2024-09-23 14:07:19,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.01 vs. limit=22.5 2024-09-23 14:07:20,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=275053.3333333333, ans=0.1 2024-09-23 14:07:33,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=275100.0, ans=0.1 2024-09-23 14:07:52,536 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.254e+02 1.336e+02 1.469e+02 8.615e+02, threshold=2.672e+02, percent-clipped=1.0 2024-09-23 14:08:22,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=275240.0, ans=0.2 2024-09-23 14:08:26,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=275240.0, ans=0.125 2024-09-23 14:08:30,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=275240.0, ans=0.125 2024-09-23 14:08:35,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=275286.6666666667, ans=0.125 2024-09-23 14:08:36,361 INFO [train.py:1198] (3/4) Epoch 16, batch 550, loss[loss=0.2001, ctc_loss=0.1345, cr_loss=0.328, over 17205.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1545, cr_loss=0.3686, over 3126083.72 frames. ], batch size: 41, lr: 7.62e-03, grad_scale: 8.0 2024-09-23 14:08:36,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2024-09-23 14:08:47,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=275286.6666666667, ans=0.2 2024-09-23 14:08:58,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=275333.3333333333, ans=0.125 2024-09-23 14:09:00,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=275333.3333333333, ans=0.1 2024-09-23 14:09:05,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=12.0 2024-09-23 14:09:22,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=275426.6666666667, ans=0.125 2024-09-23 14:09:35,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=275426.6666666667, ans=0.0 2024-09-23 14:09:47,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=275473.3333333333, ans=0.025 2024-09-23 14:09:54,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=275520.0, ans=0.1 2024-09-23 14:09:56,359 INFO [train.py:1198] (3/4) Epoch 16, batch 600, loss[loss=0.2276, ctc_loss=0.1546, cr_loss=0.365, over 17354.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1532, cr_loss=0.3666, over 3189740.88 frames. ], batch size: 48, lr: 7.61e-03, grad_scale: 8.0 2024-09-23 14:10:32,685 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.280e+02 1.400e+02 1.549e+02 2.106e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-23 14:10:37,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=275613.3333333333, ans=0.025 2024-09-23 14:10:42,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=275660.0, ans=0.0 2024-09-23 14:11:01,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=275706.6666666667, ans=0.0 2024-09-23 14:11:10,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=275706.6666666667, ans=0.0 2024-09-23 14:11:21,419 INFO [train.py:1198] (3/4) Epoch 16, batch 650, loss[loss=0.1912, ctc_loss=0.1276, cr_loss=0.3182, over 17093.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.153, cr_loss=0.3662, over 3222325.30 frames. ], batch size: 40, lr: 7.61e-03, grad_scale: 8.0 2024-09-23 14:11:31,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=275753.3333333333, ans=0.0 2024-09-23 14:12:02,609 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:12:28,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=275940.0, ans=0.0 2024-09-23 14:12:36,219 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:12:42,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=275986.6666666667, ans=0.2 2024-09-23 14:12:43,656 INFO [train.py:1198] (3/4) Epoch 16, batch 700, loss[loss=0.2429, ctc_loss=0.1631, cr_loss=0.399, over 16970.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1522, cr_loss=0.3656, over 3256009.20 frames. ], batch size: 53, lr: 7.61e-03, grad_scale: 8.0 2024-09-23 14:13:20,907 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.263e+02 1.367e+02 1.500e+02 2.228e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-23 14:13:22,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=276080.0, ans=0.125 2024-09-23 14:13:47,456 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=22.5 2024-09-23 14:14:03,948 INFO [train.py:1198] (3/4) Epoch 16, batch 750, loss[loss=0.2379, ctc_loss=0.1588, cr_loss=0.3955, over 17066.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.152, cr_loss=0.3647, over 3278430.35 frames. ], batch size: 46, lr: 7.61e-03, grad_scale: 8.0 2024-09-23 14:14:08,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.16 vs. limit=15.0 2024-09-23 14:14:14,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.03 vs. limit=22.5 2024-09-23 14:14:18,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=276266.6666666667, ans=0.125 2024-09-23 14:15:00,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=276360.0, ans=0.125 2024-09-23 14:15:01,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=276360.0, ans=0.125 2024-09-23 14:15:23,831 INFO [train.py:1198] (3/4) Epoch 16, batch 800, loss[loss=0.2455, ctc_loss=0.1688, cr_loss=0.3835, over 15075.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1519, cr_loss=0.3643, over 3294420.12 frames. ], batch size: 89, lr: 7.60e-03, grad_scale: 16.0 2024-09-23 14:15:28,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=276453.3333333333, ans=0.125 2024-09-23 14:15:57,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=276546.6666666667, ans=0.0 2024-09-23 14:16:00,467 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.250e+02 1.343e+02 1.464e+02 3.040e+02, threshold=2.687e+02, percent-clipped=1.0 2024-09-23 14:16:37,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=276640.0, ans=0.1 2024-09-23 14:16:42,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=12.0 2024-09-23 14:16:51,133 INFO [train.py:1198] (3/4) Epoch 16, batch 850, loss[loss=0.2024, ctc_loss=0.1348, cr_loss=0.3381, over 17294.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1526, cr_loss=0.3655, over 3312240.45 frames. ], batch size: 42, lr: 7.60e-03, grad_scale: 16.0 2024-09-23 14:16:53,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=276686.6666666667, ans=0.125 2024-09-23 14:17:37,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=276826.6666666667, ans=0.0 2024-09-23 14:17:40,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.05 vs. limit=10.0 2024-09-23 14:17:41,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=276826.6666666667, ans=0.125 2024-09-23 14:18:06,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=276873.3333333333, ans=0.95 2024-09-23 14:18:06,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=276873.3333333333, ans=0.125 2024-09-23 14:18:08,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=276873.3333333333, ans=0.125 2024-09-23 14:18:08,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.92 vs. limit=10.0 2024-09-23 14:18:11,457 INFO [train.py:1198] (3/4) Epoch 16, batch 900, loss[loss=0.2346, ctc_loss=0.1587, cr_loss=0.3796, over 16959.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1533, cr_loss=0.3676, over 3327030.58 frames. ], batch size: 58, lr: 7.60e-03, grad_scale: 16.0 2024-09-23 14:18:12,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2024-09-23 14:18:19,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=276920.0, ans=0.2 2024-09-23 14:18:47,947 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.278e+02 1.412e+02 1.648e+02 4.971e+02, threshold=2.824e+02, percent-clipped=1.0 2024-09-23 14:18:49,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=277013.3333333333, ans=0.1 2024-09-23 14:18:56,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=277013.3333333333, ans=0.0 2024-09-23 14:19:30,873 INFO [train.py:1198] (3/4) Epoch 16, batch 950, loss[loss=0.2671, ctc_loss=0.1851, cr_loss=0.41, over 16995.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.153, cr_loss=0.3666, over 3334838.89 frames. ], batch size: 53, lr: 7.59e-03, grad_scale: 16.0 2024-09-23 14:20:06,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=277246.6666666667, ans=0.1 2024-09-23 14:20:19,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=277293.3333333333, ans=15.0 2024-09-23 14:20:22,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.43 vs. limit=15.0 2024-09-23 14:20:41,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=277340.0, ans=0.1 2024-09-23 14:20:47,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=277340.0, ans=0.0 2024-09-23 14:20:50,274 INFO [train.py:1198] (3/4) Epoch 16, batch 1000, loss[loss=0.2593, ctc_loss=0.1746, cr_loss=0.4235, over 16480.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1537, cr_loss=0.3675, over 3336996.26 frames. ], batch size: 66, lr: 7.59e-03, grad_scale: 16.0 2024-09-23 14:20:50,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=277386.6666666667, ans=0.05 2024-09-23 14:21:12,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=277433.3333333333, ans=0.07 2024-09-23 14:21:22,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=277433.3333333333, ans=0.125 2024-09-23 14:21:25,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=277433.3333333333, ans=0.125 2024-09-23 14:21:30,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=277480.0, ans=0.0 2024-09-23 14:21:37,361 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.289e+02 1.371e+02 1.510e+02 2.522e+02, threshold=2.742e+02, percent-clipped=0.0 2024-09-23 14:21:37,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=277480.0, ans=0.125 2024-09-23 14:22:00,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2024-09-23 14:22:06,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=277573.3333333333, ans=0.0 2024-09-23 14:22:07,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=277573.3333333333, ans=0.025 2024-09-23 14:22:20,156 INFO [train.py:1198] (3/4) Epoch 16, batch 1050, loss[loss=0.2339, ctc_loss=0.1575, cr_loss=0.3818, over 17222.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.155, cr_loss=0.3696, over 3331562.37 frames. ], batch size: 50, lr: 7.59e-03, grad_scale: 16.0 2024-09-23 14:22:28,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=277620.0, ans=0.025 2024-09-23 14:22:31,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=277620.0, ans=0.125 2024-09-23 14:22:36,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=277666.6666666667, ans=0.05 2024-09-23 14:22:36,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=277666.6666666667, ans=0.1 2024-09-23 14:23:02,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-09-23 14:23:20,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=277760.0, ans=0.025 2024-09-23 14:23:33,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=277806.6666666667, ans=0.025 2024-09-23 14:23:33,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=277806.6666666667, ans=0.2 2024-09-23 14:23:37,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=277806.6666666667, ans=0.125 2024-09-23 14:23:39,993 INFO [train.py:1198] (3/4) Epoch 16, batch 1100, loss[loss=0.2533, ctc_loss=0.1727, cr_loss=0.4027, over 17029.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1546, cr_loss=0.3693, over 3334213.07 frames. ], batch size: 52, lr: 7.58e-03, grad_scale: 16.0 2024-09-23 14:23:43,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2024-09-23 14:23:53,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=277853.3333333333, ans=0.0 2024-09-23 14:23:55,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2024-09-23 14:24:08,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=277900.0, ans=0.125 2024-09-23 14:24:16,322 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.265e+02 1.345e+02 1.521e+02 2.827e+02, threshold=2.691e+02, percent-clipped=1.0 2024-09-23 14:24:19,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=277946.6666666667, ans=0.125 2024-09-23 14:24:24,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-09-23 14:24:38,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=277993.3333333333, ans=0.125 2024-09-23 14:24:48,814 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:24:59,429 INFO [train.py:1198] (3/4) Epoch 16, batch 1150, loss[loss=0.2001, ctc_loss=0.1333, cr_loss=0.3338, over 16236.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1532, cr_loss=0.3683, over 3343722.65 frames. ], batch size: 36, lr: 7.58e-03, grad_scale: 16.0 2024-09-23 14:25:30,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=278180.0, ans=0.125 2024-09-23 14:25:49,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=278226.6666666667, ans=0.125 2024-09-23 14:26:26,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=278320.0, ans=0.0 2024-09-23 14:26:27,422 INFO [train.py:1198] (3/4) Epoch 16, batch 1200, loss[loss=0.2278, ctc_loss=0.1528, cr_loss=0.3751, over 17163.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1532, cr_loss=0.3679, over 3339515.93 frames. ], batch size: 45, lr: 7.58e-03, grad_scale: 16.0 2024-09-23 14:26:28,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.05 vs. limit=10.0 2024-09-23 14:27:08,387 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.263e+02 1.375e+02 1.495e+02 2.178e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-23 14:27:08,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=278413.3333333333, ans=0.125 2024-09-23 14:27:18,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=278460.0, ans=0.1 2024-09-23 14:27:38,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=278506.6666666667, ans=0.0 2024-09-23 14:27:49,957 INFO [train.py:1198] (3/4) Epoch 16, batch 1250, loss[loss=0.2004, ctc_loss=0.1312, cr_loss=0.346, over 16983.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1545, cr_loss=0.3692, over 3317398.41 frames. ], batch size: 42, lr: 7.57e-03, grad_scale: 16.0 2024-09-23 14:27:50,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=278553.3333333333, ans=0.125 2024-09-23 14:27:59,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=278553.3333333333, ans=0.0 2024-09-23 14:28:07,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=278600.0, ans=0.2 2024-09-23 14:28:16,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-23 14:28:33,363 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:28:59,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=278740.0, ans=0.125 2024-09-23 14:29:10,196 INFO [train.py:1198] (3/4) Epoch 16, batch 1300, loss[loss=0.2049, ctc_loss=0.1365, cr_loss=0.3417, over 17092.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1531, cr_loss=0.3671, over 3330993.30 frames. ], batch size: 43, lr: 7.57e-03, grad_scale: 16.0 2024-09-23 14:29:38,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=278833.3333333333, ans=0.2 2024-09-23 14:29:48,133 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.053e+02 1.264e+02 1.359e+02 1.551e+02 2.304e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-23 14:30:29,688 INFO [train.py:1198] (3/4) Epoch 16, batch 1350, loss[loss=0.189, ctc_loss=0.1244, cr_loss=0.323, over 16273.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1537, cr_loss=0.369, over 3342820.97 frames. ], batch size: 36, lr: 7.57e-03, grad_scale: 16.0 2024-09-23 14:30:52,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.09 vs. limit=10.0 2024-09-23 14:31:09,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=279113.3333333333, ans=0.5 2024-09-23 14:31:19,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279113.3333333333, ans=0.1 2024-09-23 14:31:29,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2024-09-23 14:31:41,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=279160.0, ans=0.125 2024-09-23 14:32:00,033 INFO [train.py:1198] (3/4) Epoch 16, batch 1400, loss[loss=0.2276, ctc_loss=0.1532, cr_loss=0.372, over 17251.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1539, cr_loss=0.3693, over 3345743.08 frames. ], batch size: 44, lr: 7.56e-03, grad_scale: 16.0 2024-09-23 14:32:10,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2024-09-23 14:32:19,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=279300.0, ans=0.2 2024-09-23 14:32:30,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=279346.6666666667, ans=0.125 2024-09-23 14:32:38,350 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 1.268e+02 1.363e+02 1.512e+02 2.184e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-23 14:32:50,069 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:33:02,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=279440.0, ans=0.0 2024-09-23 14:33:10,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=279440.0, ans=0.125 2024-09-23 14:33:14,462 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.93 vs. limit=15.0 2024-09-23 14:33:19,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.79 vs. limit=22.5 2024-09-23 14:33:20,076 INFO [train.py:1198] (3/4) Epoch 16, batch 1450, loss[loss=0.221, ctc_loss=0.1481, cr_loss=0.3645, over 17359.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1541, cr_loss=0.3684, over 3329531.13 frames. ], batch size: 48, lr: 7.56e-03, grad_scale: 16.0 2024-09-23 14:33:20,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=279486.6666666667, ans=0.125 2024-09-23 14:33:26,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=279486.6666666667, ans=0.2 2024-09-23 14:34:02,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=279580.0, ans=0.125 2024-09-23 14:34:21,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=279626.6666666667, ans=0.07 2024-09-23 14:34:22,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=279673.3333333333, ans=0.125 2024-09-23 14:34:28,420 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.56 vs. limit=8.0 2024-09-23 14:34:39,816 INFO [train.py:1198] (3/4) Epoch 16, batch 1500, loss[loss=0.2457, ctc_loss=0.1674, cr_loss=0.3912, over 16913.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1536, cr_loss=0.3673, over 3335917.28 frames. ], batch size: 58, lr: 7.56e-03, grad_scale: 16.0 2024-09-23 14:34:57,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=279766.6666666667, ans=0.0 2024-09-23 14:35:00,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=279766.6666666667, ans=0.015 2024-09-23 14:35:04,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=279766.6666666667, ans=0.1 2024-09-23 14:35:07,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=279766.6666666667, ans=0.125 2024-09-23 14:35:18,098 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.287e+02 1.379e+02 1.521e+02 2.599e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-23 14:35:49,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=279906.6666666667, ans=0.2 2024-09-23 14:36:06,619 INFO [train.py:1198] (3/4) Epoch 16, batch 1550, loss[loss=0.2046, ctc_loss=0.1343, cr_loss=0.3515, over 17084.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1529, cr_loss=0.3664, over 3345593.03 frames. ], batch size: 39, lr: 7.56e-03, grad_scale: 16.0 2024-09-23 14:36:34,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=280000.0, ans=0.125 2024-09-23 14:36:34,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=280000.0, ans=0.125 2024-09-23 14:36:41,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=280046.6666666667, ans=0.125 2024-09-23 14:37:08,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=280093.3333333333, ans=0.1 2024-09-23 14:37:30,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=22.5 2024-09-23 14:37:30,856 INFO [train.py:1198] (3/4) Epoch 16, batch 1600, loss[loss=0.2058, ctc_loss=0.1361, cr_loss=0.3481, over 17219.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1529, cr_loss=0.367, over 3337477.08 frames. ], batch size: 47, lr: 7.55e-03, grad_scale: 32.0 2024-09-23 14:37:31,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=280186.6666666667, ans=0.0 2024-09-23 14:38:01,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=280280.0, ans=0.125 2024-09-23 14:38:10,324 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.239e+02 1.329e+02 1.436e+02 2.606e+02, threshold=2.657e+02, percent-clipped=0.0 2024-09-23 14:38:10,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=280280.0, ans=0.125 2024-09-23 14:38:50,314 INFO [train.py:1198] (3/4) Epoch 16, batch 1650, loss[loss=0.2454, ctc_loss=0.1643, cr_loss=0.4051, over 17013.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.153, cr_loss=0.367, over 3337745.64 frames. ], batch size: 56, lr: 7.55e-03, grad_scale: 16.0 2024-09-23 14:39:09,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=280466.6666666667, ans=0.125 2024-09-23 14:39:21,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.40 vs. limit=22.5 2024-09-23 14:39:22,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=280513.3333333333, ans=10.0 2024-09-23 14:39:29,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2024-09-23 14:39:47,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=280560.0, ans=0.0 2024-09-23 14:39:52,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=280606.6666666667, ans=0.0 2024-09-23 14:39:57,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=280606.6666666667, ans=0.1 2024-09-23 14:40:03,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=280606.6666666667, ans=0.125 2024-09-23 14:40:08,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=280653.3333333333, ans=0.2 2024-09-23 14:40:09,869 INFO [train.py:1198] (3/4) Epoch 16, batch 1700, loss[loss=0.2348, ctc_loss=0.1597, cr_loss=0.3757, over 17326.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1533, cr_loss=0.3686, over 3347244.22 frames. ], batch size: 51, lr: 7.55e-03, grad_scale: 16.0 2024-09-23 14:40:11,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=280653.3333333333, ans=0.2 2024-09-23 14:40:19,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=280653.3333333333, ans=0.125 2024-09-23 14:40:26,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=280700.0, ans=0.125 2024-09-23 14:40:45,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-09-23 14:40:52,366 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.241e+02 1.314e+02 1.415e+02 2.347e+02, threshold=2.628e+02, percent-clipped=0.0 2024-09-23 14:41:09,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=280793.3333333333, ans=0.025 2024-09-23 14:41:10,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=280793.3333333333, ans=0.125 2024-09-23 14:41:15,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=280793.3333333333, ans=0.0 2024-09-23 14:41:25,469 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:41:40,569 INFO [train.py:1198] (3/4) Epoch 16, batch 1750, loss[loss=0.2392, ctc_loss=0.1636, cr_loss=0.378, over 16882.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1524, cr_loss=0.3667, over 3346551.03 frames. ], batch size: 58, lr: 7.54e-03, grad_scale: 16.0 2024-09-23 14:41:42,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=280886.6666666667, ans=0.95 2024-09-23 14:41:52,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=280886.6666666667, ans=0.0 2024-09-23 14:42:01,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=280933.3333333333, ans=0.125 2024-09-23 14:42:01,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=280933.3333333333, ans=0.1 2024-09-23 14:42:17,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.72 vs. limit=10.0 2024-09-23 14:42:51,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=22.5 2024-09-23 14:43:00,734 INFO [train.py:1198] (3/4) Epoch 16, batch 1800, loss[loss=0.2367, ctc_loss=0.1605, cr_loss=0.3814, over 17304.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1527, cr_loss=0.3665, over 3349277.46 frames. ], batch size: 46, lr: 7.54e-03, grad_scale: 16.0 2024-09-23 14:43:16,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=281166.6666666667, ans=0.125 2024-09-23 14:43:20,294 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2024-09-23 14:43:34,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=281213.3333333333, ans=0.125 2024-09-23 14:43:40,373 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.265e+02 1.365e+02 1.521e+02 2.085e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-23 14:43:45,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2024-09-23 14:44:04,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=281306.6666666667, ans=0.5 2024-09-23 14:44:05,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2024-09-23 14:44:20,024 INFO [train.py:1198] (3/4) Epoch 16, batch 1850, loss[loss=0.2271, ctc_loss=0.1539, cr_loss=0.366, over 17310.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1528, cr_loss=0.3673, over 3353810.82 frames. ], batch size: 49, lr: 7.54e-03, grad_scale: 16.0 2024-09-23 14:44:21,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=281353.3333333333, ans=0.125 2024-09-23 14:44:29,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=281353.3333333333, ans=0.0 2024-09-23 14:44:55,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=281446.6666666667, ans=0.125 2024-09-23 14:45:25,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=281540.0, ans=0.0 2024-09-23 14:45:42,171 INFO [train.py:1198] (3/4) Epoch 16, batch 1900, loss[loss=0.2567, ctc_loss=0.1737, cr_loss=0.4147, over 17013.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1529, cr_loss=0.3673, over 3355801.50 frames. ], batch size: 51, lr: 7.53e-03, grad_scale: 16.0 2024-09-23 14:45:47,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=281586.6666666667, ans=0.125 2024-09-23 14:46:00,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=281586.6666666667, ans=15.0 2024-09-23 14:46:07,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=281633.3333333333, ans=0.125 2024-09-23 14:46:27,057 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.270e+02 1.346e+02 1.420e+02 2.030e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-23 14:46:27,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=281680.0, ans=0.125 2024-09-23 14:46:41,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=281726.6666666667, ans=0.0 2024-09-23 14:47:09,384 INFO [train.py:1198] (3/4) Epoch 16, batch 1950, loss[loss=0.2082, ctc_loss=0.1423, cr_loss=0.3294, over 17057.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.153, cr_loss=0.3673, over 3364615.32 frames. ], batch size: 39, lr: 7.53e-03, grad_scale: 16.0 2024-09-23 14:47:11,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=281820.0, ans=0.025 2024-09-23 14:47:40,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=281913.3333333333, ans=0.125 2024-09-23 14:47:59,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=281960.0, ans=0.125 2024-09-23 14:48:04,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=281960.0, ans=0.0 2024-09-23 14:48:17,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=282006.6666666667, ans=0.125 2024-09-23 14:48:29,657 INFO [train.py:1198] (3/4) Epoch 16, batch 2000, loss[loss=0.2061, ctc_loss=0.1348, cr_loss=0.3567, over 16953.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1512, cr_loss=0.3646, over 3370436.57 frames. ], batch size: 42, lr: 7.53e-03, grad_scale: 32.0 2024-09-23 14:49:09,155 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.302e+02 1.409e+02 1.610e+02 3.619e+02, threshold=2.818e+02, percent-clipped=1.0 2024-09-23 14:49:19,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=282193.3333333333, ans=0.125 2024-09-23 14:49:22,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=282193.3333333333, ans=0.125 2024-09-23 14:49:43,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2024-09-23 14:49:47,979 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:49:49,171 INFO [train.py:1198] (3/4) Epoch 16, batch 2050, loss[loss=0.2346, ctc_loss=0.163, cr_loss=0.3577, over 17047.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1516, cr_loss=0.3659, over 3372136.43 frames. ], batch size: 52, lr: 7.52e-03, grad_scale: 32.0 2024-09-23 14:49:57,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=282286.6666666667, ans=0.05 2024-09-23 14:50:04,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2024-09-23 14:50:44,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=282426.6666666667, ans=0.0 2024-09-23 14:51:13,440 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:51:16,374 INFO [train.py:1198] (3/4) Epoch 16, batch 2100, loss[loss=0.2399, ctc_loss=0.1604, cr_loss=0.3975, over 17154.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1511, cr_loss=0.3659, over 3374996.21 frames. ], batch size: 45, lr: 7.52e-03, grad_scale: 32.0 2024-09-23 14:51:21,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=12.0 2024-09-23 14:51:23,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=22.5 2024-09-23 14:51:29,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=282520.0, ans=0.125 2024-09-23 14:51:33,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=282566.6666666667, ans=0.0 2024-09-23 14:51:55,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2024-09-23 14:51:58,360 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.277e+02 1.347e+02 1.477e+02 3.259e+02, threshold=2.693e+02, percent-clipped=1.0 2024-09-23 14:52:09,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=282660.0, ans=0.0 2024-09-23 14:52:31,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=282706.6666666667, ans=0.2 2024-09-23 14:52:38,058 INFO [train.py:1198] (3/4) Epoch 16, batch 2150, loss[loss=0.1939, ctc_loss=0.1261, cr_loss=0.3391, over 17028.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1509, cr_loss=0.3645, over 3372768.35 frames. ], batch size: 44, lr: 7.52e-03, grad_scale: 32.0 2024-09-23 14:52:38,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=282753.3333333333, ans=0.125 2024-09-23 14:52:56,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=282800.0, ans=0.125 2024-09-23 14:53:01,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=282800.0, ans=0.125 2024-09-23 14:53:02,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=282800.0, ans=0.025 2024-09-23 14:53:21,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=282846.6666666667, ans=0.125 2024-09-23 14:53:58,128 INFO [train.py:1198] (3/4) Epoch 16, batch 2200, loss[loss=0.2358, ctc_loss=0.1592, cr_loss=0.3829, over 17146.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1505, cr_loss=0.3641, over 3362341.80 frames. ], batch size: 48, lr: 7.52e-03, grad_scale: 32.0 2024-09-23 14:54:13,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.90 vs. limit=15.0 2024-09-23 14:54:17,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=283033.3333333333, ans=0.125 2024-09-23 14:54:35,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=283080.0, ans=0.2 2024-09-23 14:54:38,173 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.245e+02 1.360e+02 1.489e+02 2.276e+02, threshold=2.720e+02, percent-clipped=0.0 2024-09-23 14:54:49,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=283126.6666666667, ans=0.125 2024-09-23 14:54:54,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=283126.6666666667, ans=0.1 2024-09-23 14:55:18,514 INFO [train.py:1198] (3/4) Epoch 16, batch 2250, loss[loss=0.1789, ctc_loss=0.1209, cr_loss=0.2903, over 17102.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1501, cr_loss=0.363, over 3360642.38 frames. ], batch size: 40, lr: 7.51e-03, grad_scale: 32.0 2024-09-23 14:55:36,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=283266.6666666667, ans=0.125 2024-09-23 14:56:09,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2024-09-23 14:56:15,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=283360.0, ans=0.125 2024-09-23 14:56:47,896 INFO [train.py:1198] (3/4) Epoch 16, batch 2300, loss[loss=0.2285, ctc_loss=0.1557, cr_loss=0.3639, over 15979.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1504, cr_loss=0.3636, over 3365296.61 frames. ], batch size: 74, lr: 7.51e-03, grad_scale: 32.0 2024-09-23 14:57:26,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=283546.6666666667, ans=0.125 2024-09-23 14:57:27,773 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.282e+02 1.381e+02 1.559e+02 2.607e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-23 14:57:45,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=283593.3333333333, ans=0.125 2024-09-23 14:58:07,693 INFO [train.py:1198] (3/4) Epoch 16, batch 2350, loss[loss=0.2309, ctc_loss=0.1608, cr_loss=0.3505, over 16889.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1515, cr_loss=0.3657, over 3362675.11 frames. ], batch size: 58, lr: 7.51e-03, grad_scale: 32.0 2024-09-23 14:58:27,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=283733.3333333333, ans=0.125 2024-09-23 14:58:43,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=283780.0, ans=0.1 2024-09-23 14:59:23,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=283873.3333333333, ans=0.0 2024-09-23 14:59:27,617 INFO [train.py:1198] (3/4) Epoch 16, batch 2400, loss[loss=0.2456, ctc_loss=0.1681, cr_loss=0.3874, over 17090.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1512, cr_loss=0.3659, over 3363707.62 frames. ], batch size: 49, lr: 7.50e-03, grad_scale: 32.0 2024-09-23 14:59:32,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=283920.0, ans=0.125 2024-09-23 14:59:37,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=283920.0, ans=0.125 2024-09-23 14:59:55,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=22.5 2024-09-23 15:00:07,588 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.274e+02 1.411e+02 1.575e+02 2.045e+02, threshold=2.822e+02, percent-clipped=0.0 2024-09-23 15:00:11,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.53 vs. limit=22.5 2024-09-23 15:00:14,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=284060.0, ans=0.1 2024-09-23 15:00:20,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=284060.0, ans=0.0 2024-09-23 15:00:22,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=284060.0, ans=0.2 2024-09-23 15:00:26,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=284060.0, ans=0.0 2024-09-23 15:00:46,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=284106.6666666667, ans=0.0 2024-09-23 15:00:54,751 INFO [train.py:1198] (3/4) Epoch 16, batch 2450, loss[loss=0.1953, ctc_loss=0.1293, cr_loss=0.3297, over 17068.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1515, cr_loss=0.3665, over 3358239.20 frames. ], batch size: 43, lr: 7.50e-03, grad_scale: 32.0 2024-09-23 15:01:02,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=284153.3333333333, ans=0.125 2024-09-23 15:01:09,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=284200.0, ans=0.1 2024-09-23 15:01:21,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=284200.0, ans=0.025 2024-09-23 15:01:53,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=284293.3333333333, ans=0.0 2024-09-23 15:02:17,033 INFO [train.py:1198] (3/4) Epoch 16, batch 2500, loss[loss=0.2445, ctc_loss=0.168, cr_loss=0.3821, over 17194.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1522, cr_loss=0.367, over 3351585.44 frames. ], batch size: 55, lr: 7.50e-03, grad_scale: 32.0 2024-09-23 15:02:33,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=284433.3333333333, ans=0.125 2024-09-23 15:02:56,745 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.046e+02 1.256e+02 1.352e+02 1.487e+02 2.472e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-23 15:03:09,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=284526.6666666667, ans=0.0 2024-09-23 15:03:13,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2024-09-23 15:03:36,421 INFO [train.py:1198] (3/4) Epoch 16, batch 2550, loss[loss=0.2038, ctc_loss=0.1331, cr_loss=0.3534, over 17073.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1522, cr_loss=0.3676, over 3358385.11 frames. ], batch size: 46, lr: 7.49e-03, grad_scale: 32.0 2024-09-23 15:03:41,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=284620.0, ans=0.125 2024-09-23 15:04:15,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=284713.3333333333, ans=0.2 2024-09-23 15:04:56,921 INFO [train.py:1198] (3/4) Epoch 16, batch 2600, loss[loss=0.2179, ctc_loss=0.1428, cr_loss=0.3757, over 16732.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1521, cr_loss=0.3679, over 3366981.09 frames. ], batch size: 37, lr: 7.49e-03, grad_scale: 32.0 2024-09-23 15:05:02,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=284853.3333333333, ans=0.1 2024-09-23 15:05:24,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=284900.0, ans=0.125 2024-09-23 15:05:26,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=284900.0, ans=0.0 2024-09-23 15:05:41,680 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.252e+02 1.364e+02 1.589e+02 2.408e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-23 15:06:00,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=284993.3333333333, ans=0.125 2024-09-23 15:06:05,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=284993.3333333333, ans=0.0 2024-09-23 15:06:21,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=285040.0, ans=0.125 2024-09-23 15:06:24,112 INFO [train.py:1198] (3/4) Epoch 16, batch 2650, loss[loss=0.2107, ctc_loss=0.14, cr_loss=0.3537, over 17088.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1504, cr_loss=0.3652, over 3372775.25 frames. ], batch size: 43, lr: 7.49e-03, grad_scale: 32.0 2024-09-23 15:07:02,654 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=22.5 2024-09-23 15:07:05,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=285180.0, ans=0.04949747468305833 2024-09-23 15:07:08,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=12.0 2024-09-23 15:07:10,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=285180.0, ans=0.125 2024-09-23 15:07:39,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=285273.3333333333, ans=0.1 2024-09-23 15:07:42,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=285273.3333333333, ans=0.0 2024-09-23 15:07:44,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=285273.3333333333, ans=0.0 2024-09-23 15:07:46,887 INFO [train.py:1198] (3/4) Epoch 16, batch 2700, loss[loss=0.2403, ctc_loss=0.1634, cr_loss=0.3844, over 16466.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1523, cr_loss=0.3676, over 3346562.83 frames. ], batch size: 66, lr: 7.48e-03, grad_scale: 32.0 2024-09-23 15:08:00,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-09-23 15:08:18,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=285413.3333333333, ans=0.5 2024-09-23 15:08:26,735 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.250e+02 1.328e+02 1.437e+02 2.275e+02, threshold=2.655e+02, percent-clipped=0.0 2024-09-23 15:08:27,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=285413.3333333333, ans=0.1 2024-09-23 15:08:43,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=22.5 2024-09-23 15:08:51,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=285506.6666666667, ans=0.07 2024-09-23 15:09:06,363 INFO [train.py:1198] (3/4) Epoch 16, batch 2750, loss[loss=0.2626, ctc_loss=0.1909, cr_loss=0.3587, over 11859.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1514, cr_loss=0.3652, over 3340360.43 frames. ], batch size: 123, lr: 7.48e-03, grad_scale: 32.0 2024-09-23 15:09:32,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=285600.0, ans=0.125 2024-09-23 15:09:41,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=285646.6666666667, ans=0.2 2024-09-23 15:09:43,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.99 vs. limit=22.5 2024-09-23 15:10:18,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=285740.0, ans=0.1 2024-09-23 15:10:26,022 INFO [train.py:1198] (3/4) Epoch 16, batch 2800, loss[loss=0.2654, ctc_loss=0.1848, cr_loss=0.403, over 17016.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1516, cr_loss=0.3649, over 3334035.19 frames. ], batch size: 53, lr: 7.48e-03, grad_scale: 32.0 2024-09-23 15:10:35,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=285786.6666666667, ans=0.0 2024-09-23 15:10:58,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=285833.3333333333, ans=0.125 2024-09-23 15:11:11,236 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.268e+02 1.378e+02 1.547e+02 2.101e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-23 15:11:36,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=285973.3333333333, ans=0.125 2024-09-23 15:11:42,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=8.0 2024-09-23 15:11:45,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=285973.3333333333, ans=0.2 2024-09-23 15:11:53,455 INFO [train.py:1198] (3/4) Epoch 16, batch 2850, loss[loss=0.2388, ctc_loss=0.1608, cr_loss=0.3901, over 17295.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1518, cr_loss=0.3654, over 3334811.09 frames. ], batch size: 49, lr: 7.48e-03, grad_scale: 32.0 2024-09-23 15:11:53,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=286020.0, ans=0.125 2024-09-23 15:12:00,089 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:12:28,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-09-23 15:12:31,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=286113.3333333333, ans=0.125 2024-09-23 15:12:34,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=286113.3333333333, ans=0.125 2024-09-23 15:13:00,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=286206.6666666667, ans=0.1 2024-09-23 15:13:08,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286206.6666666667, ans=0.1 2024-09-23 15:13:08,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=286206.6666666667, ans=0.125 2024-09-23 15:13:13,162 INFO [train.py:1198] (3/4) Epoch 16, batch 2900, loss[loss=0.2145, ctc_loss=0.1434, cr_loss=0.3552, over 17060.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1524, cr_loss=0.3667, over 3332210.14 frames. ], batch size: 46, lr: 7.47e-03, grad_scale: 32.0 2024-09-23 15:13:26,378 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:13:28,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.18 vs. limit=15.0 2024-09-23 15:13:29,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=286300.0, ans=0.125 2024-09-23 15:13:34,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=286300.0, ans=0.0 2024-09-23 15:13:39,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.15 vs. limit=15.0 2024-09-23 15:13:53,319 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.305e+02 1.434e+02 1.582e+02 2.466e+02, threshold=2.868e+02, percent-clipped=0.0 2024-09-23 15:13:58,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=286346.6666666667, ans=0.025 2024-09-23 15:14:33,435 INFO [train.py:1198] (3/4) Epoch 16, batch 2950, loss[loss=0.2249, ctc_loss=0.1523, cr_loss=0.363, over 17022.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1525, cr_loss=0.3665, over 3336739.90 frames. ], batch size: 51, lr: 7.47e-03, grad_scale: 32.0 2024-09-23 15:14:40,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=15.0 2024-09-23 15:14:46,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=286486.6666666667, ans=0.125 2024-09-23 15:15:20,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=286580.0, ans=0.5 2024-09-23 15:15:32,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=286626.6666666667, ans=0.0 2024-09-23 15:15:35,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286626.6666666667, ans=0.1 2024-09-23 15:15:52,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=286673.3333333333, ans=0.125 2024-09-23 15:15:58,478 INFO [train.py:1198] (3/4) Epoch 16, batch 3000, loss[loss=0.2067, ctc_loss=0.1419, cr_loss=0.3243, over 17211.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.152, cr_loss=0.366, over 3346436.59 frames. ], batch size: 47, lr: 7.47e-03, grad_scale: 32.0 2024-09-23 15:15:58,479 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 15:16:09,327 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.4746, 4.7706, 4.4580, 4.7463], device='cuda:3') 2024-09-23 15:16:14,044 INFO [train.py:1230] (3/4) Epoch 16, validation: loss=0.04215, ctc_loss=0.04215, cr_loss=7.551e-15, over 944034.00 frames. 2024-09-23 15:16:14,045 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 15:16:34,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=286766.6666666667, ans=0.125 2024-09-23 15:16:53,330 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.253e+02 1.384e+02 1.509e+02 2.501e+02, threshold=2.769e+02, percent-clipped=0.0 2024-09-23 15:16:58,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-09-23 15:17:10,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=286860.0, ans=0.0 2024-09-23 15:17:31,988 INFO [train.py:1198] (3/4) Epoch 16, batch 3050, loss[loss=0.2156, ctc_loss=0.1429, cr_loss=0.3634, over 17066.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1516, cr_loss=0.3655, over 3356068.21 frames. ], batch size: 46, lr: 7.46e-03, grad_scale: 32.0 2024-09-23 15:17:38,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=286953.3333333333, ans=0.025 2024-09-23 15:17:52,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=287000.0, ans=0.125 2024-09-23 15:18:04,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2024-09-23 15:18:09,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=287046.6666666667, ans=0.125 2024-09-23 15:18:50,320 INFO [train.py:1198] (3/4) Epoch 16, batch 3100, loss[loss=0.2449, ctc_loss=0.1657, cr_loss=0.3959, over 17039.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1531, cr_loss=0.3678, over 3342954.43 frames. ], batch size: 56, lr: 7.46e-03, grad_scale: 32.0 2024-09-23 15:19:03,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=287186.6666666667, ans=0.125 2024-09-23 15:19:29,118 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.248e+02 1.355e+02 1.474e+02 3.116e+02, threshold=2.710e+02, percent-clipped=1.0 2024-09-23 15:19:35,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=287326.6666666667, ans=0.2 2024-09-23 15:19:37,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=287326.6666666667, ans=0.125 2024-09-23 15:19:50,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=287326.6666666667, ans=0.125 2024-09-23 15:20:08,481 INFO [train.py:1198] (3/4) Epoch 16, batch 3150, loss[loss=0.2354, ctc_loss=0.1611, cr_loss=0.3713, over 16897.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.152, cr_loss=0.3653, over 3337122.57 frames. ], batch size: 58, lr: 7.46e-03, grad_scale: 16.0 2024-09-23 15:20:18,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=287420.0, ans=0.0 2024-09-23 15:20:38,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=287513.3333333333, ans=0.125 2024-09-23 15:20:47,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=287513.3333333333, ans=0.0 2024-09-23 15:21:26,319 INFO [train.py:1198] (3/4) Epoch 16, batch 3200, loss[loss=0.2759, ctc_loss=0.2028, cr_loss=0.3656, over 12263.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1534, cr_loss=0.3672, over 3327699.94 frames. ], batch size: 123, lr: 7.45e-03, grad_scale: 32.0 2024-09-23 15:21:34,220 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:21:57,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=287746.6666666667, ans=0.2 2024-09-23 15:21:57,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2024-09-23 15:22:06,510 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.274e+02 1.402e+02 1.535e+02 1.896e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-23 15:22:07,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=22.5 2024-09-23 15:22:21,221 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2024-09-23 15:22:43,941 INFO [train.py:1198] (3/4) Epoch 16, batch 3250, loss[loss=0.2231, ctc_loss=0.1495, cr_loss=0.3681, over 17159.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1532, cr_loss=0.3675, over 3338763.98 frames. ], batch size: 48, lr: 7.45e-03, grad_scale: 32.0 2024-09-23 15:22:52,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=287886.6666666667, ans=0.0 2024-09-23 15:23:12,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2024-09-23 15:23:21,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=287980.0, ans=0.1 2024-09-23 15:23:33,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=288026.6666666667, ans=0.125 2024-09-23 15:23:35,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=288026.6666666667, ans=0.125 2024-09-23 15:23:40,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=288026.6666666667, ans=0.0 2024-09-23 15:23:41,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=288026.6666666667, ans=0.07 2024-09-23 15:24:01,925 INFO [train.py:1198] (3/4) Epoch 16, batch 3300, loss[loss=0.2761, ctc_loss=0.1919, cr_loss=0.421, over 14865.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1531, cr_loss=0.3676, over 3328421.19 frames. ], batch size: 89, lr: 7.45e-03, grad_scale: 32.0 2024-09-23 15:24:19,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=288166.6666666667, ans=0.0 2024-09-23 15:24:36,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=288213.3333333333, ans=0.09899494936611666 2024-09-23 15:24:46,431 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.267e+02 1.362e+02 1.524e+02 2.882e+02, threshold=2.723e+02, percent-clipped=1.0 2024-09-23 15:25:13,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=288306.6666666667, ans=0.2 2024-09-23 15:25:18,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=288306.6666666667, ans=0.125 2024-09-23 15:25:24,525 INFO [train.py:1198] (3/4) Epoch 16, batch 3350, loss[loss=0.2653, ctc_loss=0.1869, cr_loss=0.392, over 16154.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1529, cr_loss=0.367, over 3337337.86 frames. ], batch size: 74, lr: 7.45e-03, grad_scale: 32.0 2024-09-23 15:25:26,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=288353.3333333333, ans=0.125 2024-09-23 15:25:47,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=288400.0, ans=0.1 2024-09-23 15:26:25,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=288493.3333333333, ans=0.025 2024-09-23 15:26:45,094 INFO [train.py:1198] (3/4) Epoch 16, batch 3400, loss[loss=0.1918, ctc_loss=0.1279, cr_loss=0.3197, over 17034.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1534, cr_loss=0.3681, over 3336922.51 frames. ], batch size: 39, lr: 7.44e-03, grad_scale: 32.0 2024-09-23 15:27:00,163 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.74 vs. limit=15.0 2024-09-23 15:27:02,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.74 vs. limit=22.5 2024-09-23 15:27:09,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=288633.3333333333, ans=0.0 2024-09-23 15:27:24,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288680.0, ans=0.1 2024-09-23 15:27:27,929 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.234e+02 1.362e+02 1.557e+02 3.833e+02, threshold=2.724e+02, percent-clipped=1.0 2024-09-23 15:27:42,466 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:27:43,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=288726.6666666667, ans=0.125 2024-09-23 15:28:01,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=288773.3333333333, ans=0.0 2024-09-23 15:28:05,749 INFO [train.py:1198] (3/4) Epoch 16, batch 3450, loss[loss=0.2227, ctc_loss=0.1515, cr_loss=0.3559, over 17025.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1544, cr_loss=0.3696, over 3331887.75 frames. ], batch size: 44, lr: 7.44e-03, grad_scale: 32.0 2024-09-23 15:28:32,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=288866.6666666667, ans=0.04949747468305833 2024-09-23 15:28:58,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=288960.0, ans=0.95 2024-09-23 15:29:09,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=289006.6666666667, ans=0.025 2024-09-23 15:29:23,287 INFO [train.py:1198] (3/4) Epoch 16, batch 3500, loss[loss=0.2354, ctc_loss=0.1615, cr_loss=0.3695, over 17303.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1537, cr_loss=0.3682, over 3343792.43 frames. ], batch size: 51, lr: 7.44e-03, grad_scale: 32.0 2024-09-23 15:29:39,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=289100.0, ans=0.125 2024-09-23 15:29:44,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.63 vs. limit=15.0 2024-09-23 15:29:46,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=289100.0, ans=0.0 2024-09-23 15:29:48,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=289100.0, ans=0.2 2024-09-23 15:29:54,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=289146.6666666667, ans=0.0 2024-09-23 15:29:56,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=289146.6666666667, ans=0.025 2024-09-23 15:29:57,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=289146.6666666667, ans=0.035 2024-09-23 15:30:03,921 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.282e+02 1.368e+02 1.525e+02 3.473e+02, threshold=2.736e+02, percent-clipped=1.0 2024-09-23 15:30:09,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=289193.3333333333, ans=0.125 2024-09-23 15:30:16,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=8.0 2024-09-23 15:30:29,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=289240.0, ans=0.0 2024-09-23 15:30:29,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=289240.0, ans=0.0 2024-09-23 15:30:32,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=289240.0, ans=0.0 2024-09-23 15:30:34,389 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.27 vs. limit=15.0 2024-09-23 15:30:35,633 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:30:41,434 INFO [train.py:1198] (3/4) Epoch 16, batch 3550, loss[loss=0.2649, ctc_loss=0.1834, cr_loss=0.4073, over 17235.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1548, cr_loss=0.3694, over 3338016.42 frames. ], batch size: 55, lr: 7.43e-03, grad_scale: 32.0 2024-09-23 15:31:39,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=289426.6666666667, ans=0.1 2024-09-23 15:31:50,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=289473.3333333333, ans=0.0 2024-09-23 15:31:54,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=289473.3333333333, ans=0.125 2024-09-23 15:32:00,002 INFO [train.py:1198] (3/4) Epoch 16, batch 3600, loss[loss=0.2238, ctc_loss=0.1478, cr_loss=0.3801, over 17017.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1537, cr_loss=0.3688, over 3347138.58 frames. ], batch size: 44, lr: 7.43e-03, grad_scale: 32.0 2024-09-23 15:32:18,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=289566.6666666667, ans=0.0 2024-09-23 15:32:39,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=289613.3333333333, ans=0.2 2024-09-23 15:32:40,499 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.033e+02 1.232e+02 1.305e+02 1.381e+02 1.906e+02, threshold=2.610e+02, percent-clipped=0.0 2024-09-23 15:32:44,722 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=22.5 2024-09-23 15:32:57,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=289660.0, ans=15.0 2024-09-23 15:33:18,232 INFO [train.py:1198] (3/4) Epoch 16, batch 3650, loss[loss=0.2402, ctc_loss=0.1632, cr_loss=0.3853, over 17003.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1534, cr_loss=0.3681, over 3344722.00 frames. ], batch size: 53, lr: 7.43e-03, grad_scale: 16.0 2024-09-23 15:33:23,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=289753.3333333333, ans=10.0 2024-09-23 15:33:34,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=289800.0, ans=0.1 2024-09-23 15:33:47,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=22.5 2024-09-23 15:34:02,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=289846.6666666667, ans=0.125 2024-09-23 15:34:05,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=289893.3333333333, ans=0.0 2024-09-23 15:34:34,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=289940.0, ans=0.0 2024-09-23 15:34:40,768 INFO [train.py:1198] (3/4) Epoch 16, batch 3700, loss[loss=0.1971, ctc_loss=0.1321, cr_loss=0.325, over 16966.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1531, cr_loss=0.3691, over 3355557.30 frames. ], batch size: 42, lr: 7.43e-03, grad_scale: 16.0 2024-09-23 15:34:50,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=289986.6666666667, ans=0.0 2024-09-23 15:34:59,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=290033.3333333333, ans=0.025 2024-09-23 15:35:13,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=290080.0, ans=0.0 2024-09-23 15:35:22,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=290080.0, ans=0.025 2024-09-23 15:35:23,810 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.296e+02 1.389e+02 1.546e+02 2.430e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-23 15:35:30,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=290126.6666666667, ans=0.2 2024-09-23 15:35:44,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=290173.3333333333, ans=0.125 2024-09-23 15:35:56,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=290173.3333333333, ans=0.125 2024-09-23 15:35:59,679 INFO [train.py:1198] (3/4) Epoch 16, batch 3750, loss[loss=0.2128, ctc_loss=0.1434, cr_loss=0.3474, over 17156.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1528, cr_loss=0.3675, over 3350328.99 frames. ], batch size: 45, lr: 7.42e-03, grad_scale: 16.0 2024-09-23 15:36:57,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=290360.0, ans=0.125 2024-09-23 15:37:03,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=290406.6666666667, ans=0.0 2024-09-23 15:37:18,712 INFO [train.py:1198] (3/4) Epoch 16, batch 3800, loss[loss=0.2629, ctc_loss=0.1798, cr_loss=0.4154, over 15251.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1541, cr_loss=0.3684, over 3335930.30 frames. ], batch size: 89, lr: 7.42e-03, grad_scale: 16.0 2024-09-23 15:38:00,416 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.261e+02 1.370e+02 1.506e+02 3.479e+02, threshold=2.739e+02, percent-clipped=1.0 2024-09-23 15:38:06,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=290593.3333333333, ans=0.2 2024-09-23 15:38:25,136 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.23 vs. limit=15.0 2024-09-23 15:38:36,620 INFO [train.py:1198] (3/4) Epoch 16, batch 3850, loss[loss=0.2476, ctc_loss=0.1672, cr_loss=0.4015, over 17327.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1558, cr_loss=0.3694, over 3278974.94 frames. ], batch size: 51, lr: 7.42e-03, grad_scale: 16.0 2024-09-23 15:38:58,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2024-09-23 15:39:18,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=15.0 2024-09-23 15:39:33,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=290826.6666666667, ans=0.0 2024-09-23 15:39:34,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2024-09-23 15:39:39,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=12.0 2024-09-23 15:39:43,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=290873.3333333333, ans=0.125 2024-09-23 15:40:38,388 INFO [train.py:1198] (3/4) Epoch 17, batch 0, loss[loss=0.2525, ctc_loss=0.1723, cr_loss=0.4012, over 17044.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1723, cr_loss=0.4012, over 17044.00 frames. ], batch size: 52, lr: 7.19e-03, grad_scale: 32.0 2024-09-23 15:40:38,389 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 15:40:45,465 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3479, 3.4523, 2.9557, 3.3139, 2.6450, 3.2968, 3.2575, 3.2220], device='cuda:3') 2024-09-23 15:40:53,766 INFO [train.py:1230] (3/4) Epoch 17, validation: loss=0.04104, ctc_loss=0.04104, cr_loss=7.589e-15, over 944034.00 frames. 2024-09-23 15:40:53,767 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 15:41:06,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=290901.3333333333, ans=0.0 2024-09-23 15:41:11,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=290948.0, ans=0.125 2024-09-23 15:41:43,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=291041.3333333333, ans=0.0 2024-09-23 15:41:43,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=22.5 2024-09-23 15:41:46,228 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.183e+02 1.415e+02 1.552e+02 1.651e+02 2.695e+02, threshold=3.103e+02, percent-clipped=0.0 2024-09-23 15:42:18,134 INFO [train.py:1198] (3/4) Epoch 17, batch 50, loss[loss=0.2241, ctc_loss=0.1517, cr_loss=0.3616, over 16761.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1569, cr_loss=0.3746, over 748207.89 frames. ], batch size: 61, lr: 7.19e-03, grad_scale: 16.0 2024-09-23 15:43:06,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=291274.6666666667, ans=0.035 2024-09-23 15:43:30,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=291321.3333333333, ans=0.0 2024-09-23 15:43:39,805 INFO [train.py:1198] (3/4) Epoch 17, batch 100, loss[loss=0.1986, ctc_loss=0.1325, cr_loss=0.3303, over 17206.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1545, cr_loss=0.3729, over 1337572.83 frames. ], batch size: 41, lr: 7.18e-03, grad_scale: 16.0 2024-09-23 15:44:20,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=291461.3333333333, ans=0.025 2024-09-23 15:44:31,115 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.243e+02 1.327e+02 1.449e+02 2.389e+02, threshold=2.654e+02, percent-clipped=0.0 2024-09-23 15:44:50,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=291554.6666666667, ans=0.125 2024-09-23 15:44:59,996 INFO [train.py:1198] (3/4) Epoch 17, batch 150, loss[loss=0.2051, ctc_loss=0.1371, cr_loss=0.3396, over 17305.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1538, cr_loss=0.3709, over 1782412.78 frames. ], batch size: 46, lr: 7.18e-03, grad_scale: 16.0 2024-09-23 15:45:08,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=291601.3333333333, ans=0.0 2024-09-23 15:45:11,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=291601.3333333333, ans=0.0 2024-09-23 15:45:11,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=15.0 2024-09-23 15:45:13,626 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.33 vs. limit=22.5 2024-09-23 15:45:21,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=291648.0, ans=0.1 2024-09-23 15:45:27,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=291648.0, ans=0.1 2024-09-23 15:45:43,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=291694.6666666667, ans=0.0 2024-09-23 15:45:43,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=291694.6666666667, ans=0.2 2024-09-23 15:45:54,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=291741.3333333333, ans=0.2 2024-09-23 15:46:00,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=291741.3333333333, ans=0.1 2024-09-23 15:46:25,718 INFO [train.py:1198] (3/4) Epoch 17, batch 200, loss[loss=0.2188, ctc_loss=0.1461, cr_loss=0.3635, over 17034.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1537, cr_loss=0.3701, over 2126376.20 frames. ], batch size: 56, lr: 7.18e-03, grad_scale: 16.0 2024-09-23 15:46:34,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=291834.6666666667, ans=0.125 2024-09-23 15:47:00,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.12 vs. limit=15.0 2024-09-23 15:47:18,310 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.306e+02 1.386e+02 1.614e+02 2.877e+02, threshold=2.773e+02, percent-clipped=1.0 2024-09-23 15:47:25,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2024-09-23 15:47:31,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=292021.3333333333, ans=0.05 2024-09-23 15:47:33,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=292021.3333333333, ans=0.125 2024-09-23 15:47:49,027 INFO [train.py:1198] (3/4) Epoch 17, batch 250, loss[loss=0.2087, ctc_loss=0.1405, cr_loss=0.341, over 17244.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1519, cr_loss=0.3674, over 2410022.78 frames. ], batch size: 50, lr: 7.18e-03, grad_scale: 16.0 2024-09-23 15:47:55,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=292068.0, ans=0.0 2024-09-23 15:48:29,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=292161.3333333333, ans=0.2 2024-09-23 15:48:43,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=292208.0, ans=0.0 2024-09-23 15:49:07,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=292301.3333333333, ans=0.125 2024-09-23 15:49:08,866 INFO [train.py:1198] (3/4) Epoch 17, batch 300, loss[loss=0.2511, ctc_loss=0.1727, cr_loss=0.3918, over 16907.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1518, cr_loss=0.367, over 2611277.76 frames. ], batch size: 58, lr: 7.17e-03, grad_scale: 16.0 2024-09-23 15:49:57,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=292441.3333333333, ans=0.125 2024-09-23 15:50:00,587 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.265e+02 1.346e+02 1.452e+02 2.269e+02, threshold=2.693e+02, percent-clipped=0.0 2024-09-23 15:50:05,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=292441.3333333333, ans=0.1 2024-09-23 15:50:17,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=292488.0, ans=0.125 2024-09-23 15:50:17,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=292488.0, ans=0.125 2024-09-23 15:50:32,874 INFO [train.py:1198] (3/4) Epoch 17, batch 350, loss[loss=0.1848, ctc_loss=0.1193, cr_loss=0.3276, over 16766.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1507, cr_loss=0.3662, over 2771589.42 frames. ], batch size: 37, lr: 7.17e-03, grad_scale: 16.0 2024-09-23 15:50:42,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=292534.6666666667, ans=0.125 2024-09-23 15:50:52,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=292581.3333333333, ans=0.1 2024-09-23 15:51:11,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.93 vs. limit=15.0 2024-09-23 15:51:30,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=292674.6666666667, ans=0.125 2024-09-23 15:51:44,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=292721.3333333333, ans=0.0 2024-09-23 15:51:57,139 INFO [train.py:1198] (3/4) Epoch 17, batch 400, loss[loss=0.2386, ctc_loss=0.159, cr_loss=0.3978, over 17207.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1497, cr_loss=0.3648, over 2905279.52 frames. ], batch size: 55, lr: 7.17e-03, grad_scale: 32.0 2024-09-23 15:52:09,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=15.0 2024-09-23 15:52:29,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=292861.3333333333, ans=0.2 2024-09-23 15:52:32,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=292861.3333333333, ans=0.125 2024-09-23 15:52:37,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=292861.3333333333, ans=0.1 2024-09-23 15:52:49,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.83 vs. limit=6.0 2024-09-23 15:52:50,104 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.274e+02 1.356e+02 1.494e+02 2.535e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-23 15:52:50,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=292908.0, ans=10.0 2024-09-23 15:53:07,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=22.5 2024-09-23 15:53:18,849 INFO [train.py:1198] (3/4) Epoch 17, batch 450, loss[loss=0.2149, ctc_loss=0.1455, cr_loss=0.3471, over 17221.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.151, cr_loss=0.3669, over 3009599.23 frames. ], batch size: 47, lr: 7.16e-03, grad_scale: 32.0 2024-09-23 15:53:35,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-09-23 15:53:39,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=293048.0, ans=0.025 2024-09-23 15:53:48,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.17 vs. limit=15.0 2024-09-23 15:54:39,068 INFO [train.py:1198] (3/4) Epoch 17, batch 500, loss[loss=0.2233, ctc_loss=0.1485, cr_loss=0.3738, over 17059.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.151, cr_loss=0.3674, over 3090594.37 frames. ], batch size: 46, lr: 7.16e-03, grad_scale: 32.0 2024-09-23 15:55:11,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=293328.0, ans=0.0 2024-09-23 15:55:13,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=293328.0, ans=0.2 2024-09-23 15:55:32,746 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.054e+02 1.258e+02 1.318e+02 1.426e+02 2.177e+02, threshold=2.636e+02, percent-clipped=0.0 2024-09-23 15:55:45,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=293421.3333333333, ans=0.0 2024-09-23 15:55:52,320 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:55:58,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.43 vs. limit=22.5 2024-09-23 15:56:03,962 INFO [train.py:1198] (3/4) Epoch 17, batch 550, loss[loss=0.2348, ctc_loss=0.1565, cr_loss=0.3915, over 17038.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1511, cr_loss=0.3681, over 3154165.61 frames. ], batch size: 56, lr: 7.16e-03, grad_scale: 32.0 2024-09-23 15:56:07,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293468.0, ans=0.1 2024-09-23 15:56:28,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=293514.6666666667, ans=10.0 2024-09-23 15:56:53,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=293608.0, ans=0.0 2024-09-23 15:56:56,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=293608.0, ans=0.0 2024-09-23 15:57:01,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=293608.0, ans=0.125 2024-09-23 15:57:27,139 INFO [train.py:1198] (3/4) Epoch 17, batch 600, loss[loss=0.1886, ctc_loss=0.1235, cr_loss=0.3252, over 17100.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1499, cr_loss=0.3654, over 3206332.47 frames. ], batch size: 40, lr: 7.16e-03, grad_scale: 32.0 2024-09-23 15:57:58,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=293748.0, ans=0.1 2024-09-23 15:58:20,483 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.285e+02 1.384e+02 1.494e+02 2.358e+02, threshold=2.768e+02, percent-clipped=0.0 2024-09-23 15:58:49,414 INFO [train.py:1198] (3/4) Epoch 17, batch 650, loss[loss=0.1746, ctc_loss=0.115, cr_loss=0.2979, over 17041.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1519, cr_loss=0.3682, over 3227103.31 frames. ], batch size: 39, lr: 7.15e-03, grad_scale: 32.0 2024-09-23 15:59:10,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=293981.3333333333, ans=0.07 2024-09-23 15:59:57,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=294121.3333333333, ans=0.0 2024-09-23 16:00:09,860 INFO [train.py:1198] (3/4) Epoch 17, batch 700, loss[loss=0.2105, ctc_loss=0.1413, cr_loss=0.346, over 17036.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1511, cr_loss=0.3666, over 3256993.15 frames. ], batch size: 44, lr: 7.15e-03, grad_scale: 32.0 2024-09-23 16:00:20,140 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=22.5 2024-09-23 16:00:20,313 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2024-09-23 16:00:59,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.56 vs. limit=15.0 2024-09-23 16:01:06,301 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.243e+02 1.328e+02 1.463e+02 2.107e+02, threshold=2.656e+02, percent-clipped=0.0 2024-09-23 16:01:14,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=294308.0, ans=0.1 2024-09-23 16:01:34,889 INFO [train.py:1198] (3/4) Epoch 17, batch 750, loss[loss=0.1684, ctc_loss=0.1102, cr_loss=0.2911, over 17281.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1508, cr_loss=0.3657, over 3281291.84 frames. ], batch size: 42, lr: 7.15e-03, grad_scale: 32.0 2024-09-23 16:01:38,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=294401.3333333333, ans=0.1 2024-09-23 16:01:49,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=294401.3333333333, ans=0.125 2024-09-23 16:02:00,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=294448.0, ans=0.1 2024-09-23 16:02:08,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=294494.6666666667, ans=0.07 2024-09-23 16:02:19,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=294494.6666666667, ans=0.0 2024-09-23 16:02:22,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=294494.6666666667, ans=0.0 2024-09-23 16:02:30,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=294541.3333333333, ans=0.125 2024-09-23 16:02:33,964 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 16:02:42,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=294588.0, ans=0.0 2024-09-23 16:02:44,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=294588.0, ans=0.0 2024-09-23 16:02:47,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=294588.0, ans=0.95 2024-09-23 16:03:00,186 INFO [train.py:1198] (3/4) Epoch 17, batch 800, loss[loss=0.2335, ctc_loss=0.1577, cr_loss=0.3792, over 17013.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1505, cr_loss=0.3643, over 3285108.66 frames. ], batch size: 44, lr: 7.14e-03, grad_scale: 32.0 2024-09-23 16:03:03,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=294634.6666666667, ans=0.125 2024-09-23 16:03:29,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=294681.3333333333, ans=0.0 2024-09-23 16:03:51,161 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 1.255e+02 1.363e+02 1.453e+02 2.011e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-23 16:03:54,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=294774.6666666667, ans=0.125 2024-09-23 16:04:15,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=294821.3333333333, ans=0.125 2024-09-23 16:04:18,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=294868.0, ans=0.125 2024-09-23 16:04:19,629 INFO [train.py:1198] (3/4) Epoch 17, batch 850, loss[loss=0.2675, ctc_loss=0.1902, cr_loss=0.3865, over 11635.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1509, cr_loss=0.3648, over 3295810.06 frames. ], batch size: 125, lr: 7.14e-03, grad_scale: 32.0 2024-09-23 16:04:50,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=294961.3333333333, ans=0.025 2024-09-23 16:04:53,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=294961.3333333333, ans=0.125 2024-09-23 16:04:55,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=294961.3333333333, ans=0.125 2024-09-23 16:05:41,832 INFO [train.py:1198] (3/4) Epoch 17, batch 900, loss[loss=0.2066, ctc_loss=0.1374, cr_loss=0.3463, over 17031.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1509, cr_loss=0.3651, over 3315599.99 frames. ], batch size: 39, lr: 7.14e-03, grad_scale: 32.0 2024-09-23 16:05:52,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=295101.3333333333, ans=0.2 2024-09-23 16:05:59,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=295148.0, ans=0.125 2024-09-23 16:06:33,592 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.00 vs. limit=10.0 2024-09-23 16:06:35,668 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.252e+02 1.345e+02 1.506e+02 2.061e+02, threshold=2.689e+02, percent-clipped=0.0 2024-09-23 16:07:05,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=295334.6666666667, ans=0.125 2024-09-23 16:07:07,054 INFO [train.py:1198] (3/4) Epoch 17, batch 950, loss[loss=0.2548, ctc_loss=0.1725, cr_loss=0.4113, over 17152.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1495, cr_loss=0.3636, over 3331851.86 frames. ], batch size: 48, lr: 7.14e-03, grad_scale: 32.0 2024-09-23 16:07:13,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=295334.6666666667, ans=0.125 2024-09-23 16:07:39,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=295428.0, ans=0.07 2024-09-23 16:07:42,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=295428.0, ans=0.125 2024-09-23 16:08:06,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2024-09-23 16:08:14,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2024-09-23 16:08:29,165 INFO [train.py:1198] (3/4) Epoch 17, batch 1000, loss[loss=0.2287, ctc_loss=0.1539, cr_loss=0.3737, over 17167.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1499, cr_loss=0.3647, over 3346668.87 frames. ], batch size: 45, lr: 7.13e-03, grad_scale: 32.0 2024-09-23 16:08:37,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=295568.0, ans=0.0 2024-09-23 16:09:10,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=295661.3333333333, ans=0.0 2024-09-23 16:09:14,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=15.0 2024-09-23 16:09:15,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2024-09-23 16:09:19,594 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.278e+02 1.364e+02 1.541e+02 2.196e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-23 16:09:47,781 INFO [train.py:1198] (3/4) Epoch 17, batch 1050, loss[loss=0.1989, ctc_loss=0.1322, cr_loss=0.3336, over 17108.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1493, cr_loss=0.3636, over 3354539.95 frames. ], batch size: 40, lr: 7.13e-03, grad_scale: 32.0 2024-09-23 16:09:57,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=295801.3333333333, ans=0.95 2024-09-23 16:10:33,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.44 vs. limit=15.0 2024-09-23 16:11:12,826 INFO [train.py:1198] (3/4) Epoch 17, batch 1100, loss[loss=0.1971, ctc_loss=0.1314, cr_loss=0.3285, over 17118.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1487, cr_loss=0.3631, over 3365520.64 frames. ], batch size: 49, lr: 7.13e-03, grad_scale: 32.0 2024-09-23 16:11:29,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=296081.3333333333, ans=0.125 2024-09-23 16:11:36,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=296081.3333333333, ans=0.1 2024-09-23 16:11:41,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296081.3333333333, ans=0.1 2024-09-23 16:11:46,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296128.0, ans=0.1 2024-09-23 16:11:48,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2024-09-23 16:12:06,483 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.273e+02 1.407e+02 1.625e+02 2.289e+02, threshold=2.813e+02, percent-clipped=0.0 2024-09-23 16:12:32,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=296221.3333333333, ans=0.125 2024-09-23 16:12:35,108 INFO [train.py:1198] (3/4) Epoch 17, batch 1150, loss[loss=0.25, ctc_loss=0.1713, cr_loss=0.3933, over 16401.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1491, cr_loss=0.3638, over 3370378.29 frames. ], batch size: 66, lr: 7.13e-03, grad_scale: 32.0 2024-09-23 16:12:35,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=296268.0, ans=0.0 2024-09-23 16:12:54,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=296314.6666666667, ans=0.125 2024-09-23 16:12:55,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=296314.6666666667, ans=0.0 2024-09-23 16:12:58,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=296314.6666666667, ans=0.0 2024-09-23 16:13:06,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=296314.6666666667, ans=0.125 2024-09-23 16:13:43,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-09-23 16:13:45,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=296454.6666666667, ans=0.0 2024-09-23 16:13:57,591 INFO [train.py:1198] (3/4) Epoch 17, batch 1200, loss[loss=0.1951, ctc_loss=0.1298, cr_loss=0.3267, over 16957.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1507, cr_loss=0.3666, over 3363528.84 frames. ], batch size: 42, lr: 7.12e-03, grad_scale: 32.0 2024-09-23 16:14:13,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=296548.0, ans=0.0 2024-09-23 16:14:49,696 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.250e+02 1.332e+02 1.452e+02 2.634e+02, threshold=2.664e+02, percent-clipped=0.0 2024-09-23 16:14:59,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=296688.0, ans=0.09899494936611666 2024-09-23 16:15:02,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=296688.0, ans=0.0 2024-09-23 16:15:19,280 INFO [train.py:1198] (3/4) Epoch 17, batch 1250, loss[loss=0.244, ctc_loss=0.1634, cr_loss=0.4029, over 16622.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1506, cr_loss=0.3665, over 3353814.96 frames. ], batch size: 66, lr: 7.12e-03, grad_scale: 32.0 2024-09-23 16:15:51,336 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-23 16:16:03,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=296828.0, ans=0.0 2024-09-23 16:16:03,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=296828.0, ans=0.125 2024-09-23 16:16:08,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=296874.6666666667, ans=0.1 2024-09-23 16:16:35,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=296921.3333333333, ans=0.0 2024-09-23 16:16:36,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=296921.3333333333, ans=0.125 2024-09-23 16:16:41,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=296921.3333333333, ans=0.125 2024-09-23 16:16:44,492 INFO [train.py:1198] (3/4) Epoch 17, batch 1300, loss[loss=0.2375, ctc_loss=0.1613, cr_loss=0.381, over 17037.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1513, cr_loss=0.3669, over 3346494.17 frames. ], batch size: 52, lr: 7.12e-03, grad_scale: 32.0 2024-09-23 16:16:51,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=296968.0, ans=0.125 2024-09-23 16:17:03,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=297014.6666666667, ans=0.0 2024-09-23 16:17:06,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2024-09-23 16:17:07,131 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 16:17:23,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=297061.3333333333, ans=0.0 2024-09-23 16:17:26,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=297061.3333333333, ans=0.0 2024-09-23 16:17:36,808 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.242e+02 1.322e+02 1.445e+02 3.373e+02, threshold=2.644e+02, percent-clipped=1.0 2024-09-23 16:17:58,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=297154.6666666667, ans=0.025 2024-09-23 16:17:59,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297154.6666666667, ans=0.1 2024-09-23 16:18:02,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=297154.6666666667, ans=0.125 2024-09-23 16:18:02,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=297154.6666666667, ans=0.0 2024-09-23 16:18:06,588 INFO [train.py:1198] (3/4) Epoch 17, batch 1350, loss[loss=0.2204, ctc_loss=0.1493, cr_loss=0.3553, over 17300.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1508, cr_loss=0.3667, over 3354599.36 frames. ], batch size: 46, lr: 7.11e-03, grad_scale: 32.0 2024-09-23 16:18:51,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=297294.6666666667, ans=0.125 2024-09-23 16:19:20,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=297388.0, ans=0.5 2024-09-23 16:19:26,089 INFO [train.py:1198] (3/4) Epoch 17, batch 1400, loss[loss=0.2016, ctc_loss=0.1352, cr_loss=0.3319, over 17209.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1499, cr_loss=0.3652, over 3355110.58 frames. ], batch size: 50, lr: 7.11e-03, grad_scale: 32.0 2024-09-23 16:19:43,972 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 16:19:45,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=297481.3333333333, ans=0.125 2024-09-23 16:20:04,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=297528.0, ans=0.2 2024-09-23 16:20:20,766 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.246e+02 1.345e+02 1.562e+02 2.130e+02, threshold=2.691e+02, percent-clipped=0.0 2024-09-23 16:20:45,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=297621.3333333333, ans=0.125 2024-09-23 16:20:50,355 INFO [train.py:1198] (3/4) Epoch 17, batch 1450, loss[loss=0.2188, ctc_loss=0.1457, cr_loss=0.3653, over 17023.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1512, cr_loss=0.3669, over 3343524.25 frames. ], batch size: 51, lr: 7.11e-03, grad_scale: 32.0 2024-09-23 16:21:41,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=297808.0, ans=0.125 2024-09-23 16:21:41,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=297808.0, ans=0.0 2024-09-23 16:21:47,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297808.0, ans=0.1 2024-09-23 16:21:55,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=297854.6666666667, ans=0.125 2024-09-23 16:22:00,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=297854.6666666667, ans=0.125 2024-09-23 16:22:12,532 INFO [train.py:1198] (3/4) Epoch 17, batch 1500, loss[loss=0.2161, ctc_loss=0.1433, cr_loss=0.3643, over 17006.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1518, cr_loss=0.3678, over 3340308.22 frames. ], batch size: 44, lr: 7.11e-03, grad_scale: 32.0 2024-09-23 16:22:13,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2024-09-23 16:22:35,023 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 16:22:47,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=297994.6666666667, ans=0.0 2024-09-23 16:22:47,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.88 vs. limit=10.0 2024-09-23 16:22:54,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297994.6666666667, ans=0.1 2024-09-23 16:22:56,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297994.6666666667, ans=0.1 2024-09-23 16:23:07,241 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.004e+02 1.245e+02 1.341e+02 1.437e+02 3.249e+02, threshold=2.682e+02, percent-clipped=1.0 2024-09-23 16:23:09,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=298041.3333333333, ans=0.125 2024-09-23 16:23:09,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=298041.3333333333, ans=0.125 2024-09-23 16:23:34,453 INFO [train.py:1198] (3/4) Epoch 17, batch 1550, loss[loss=0.2261, ctc_loss=0.1526, cr_loss=0.3675, over 17028.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1519, cr_loss=0.3681, over 3350689.67 frames. ], batch size: 51, lr: 7.10e-03, grad_scale: 32.0 2024-09-23 16:23:41,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.59 vs. limit=22.5 2024-09-23 16:23:45,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=298134.6666666667, ans=0.125 2024-09-23 16:24:27,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2024-09-23 16:24:30,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=298274.6666666667, ans=0.0 2024-09-23 16:24:37,501 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2024-09-23 16:24:54,221 INFO [train.py:1198] (3/4) Epoch 17, batch 1600, loss[loss=0.2395, ctc_loss=0.1645, cr_loss=0.3751, over 17043.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1513, cr_loss=0.3668, over 3361465.76 frames. ], batch size: 52, lr: 7.10e-03, grad_scale: 32.0 2024-09-23 16:25:51,724 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.003e+02 1.268e+02 1.375e+02 1.538e+02 2.240e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-23 16:26:03,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=298554.6666666667, ans=0.1 2024-09-23 16:26:08,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=298554.6666666667, ans=0.125 2024-09-23 16:26:18,871 INFO [train.py:1198] (3/4) Epoch 17, batch 1650, loss[loss=0.2849, ctc_loss=0.2017, cr_loss=0.4159, over 11717.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.152, cr_loss=0.3678, over 3348339.86 frames. ], batch size: 123, lr: 7.10e-03, grad_scale: 32.0 2024-09-23 16:26:19,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=298601.3333333333, ans=0.125 2024-09-23 16:26:40,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=298648.0, ans=0.125 2024-09-23 16:27:06,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=298694.6666666667, ans=0.125 2024-09-23 16:27:19,045 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.75 vs. limit=15.0 2024-09-23 16:27:45,763 INFO [train.py:1198] (3/4) Epoch 17, batch 1700, loss[loss=0.2299, ctc_loss=0.1523, cr_loss=0.3882, over 17304.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1522, cr_loss=0.3679, over 3345319.01 frames. ], batch size: 46, lr: 7.09e-03, grad_scale: 32.0 2024-09-23 16:27:55,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=298834.6666666667, ans=0.1 2024-09-23 16:28:07,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=12.0 2024-09-23 16:28:38,520 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.237e+02 1.323e+02 1.443e+02 1.876e+02, threshold=2.646e+02, percent-clipped=0.0 2024-09-23 16:28:51,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=299021.3333333333, ans=0.2 2024-09-23 16:29:05,469 INFO [train.py:1198] (3/4) Epoch 17, batch 1750, loss[loss=0.2811, ctc_loss=0.2002, cr_loss=0.4045, over 12015.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1518, cr_loss=0.3679, over 3349163.89 frames. ], batch size: 124, lr: 7.09e-03, grad_scale: 32.0 2024-09-23 16:29:17,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.58 vs. limit=15.0 2024-09-23 16:29:23,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=299114.6666666667, ans=0.1 2024-09-23 16:29:24,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=299114.6666666667, ans=0.125 2024-09-23 16:29:47,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-09-23 16:30:13,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=299254.6666666667, ans=0.0 2024-09-23 16:30:27,808 INFO [train.py:1198] (3/4) Epoch 17, batch 1800, loss[loss=0.2038, ctc_loss=0.1361, cr_loss=0.3384, over 17098.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1516, cr_loss=0.3673, over 3356515.86 frames. ], batch size: 49, lr: 7.09e-03, grad_scale: 32.0 2024-09-23 16:30:51,683 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-09-23 16:30:55,064 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.58 vs. limit=15.0 2024-09-23 16:31:07,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=299394.6666666667, ans=0.125 2024-09-23 16:31:15,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=299394.6666666667, ans=0.125 2024-09-23 16:31:18,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=299441.3333333333, ans=10.0 2024-09-23 16:31:23,056 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.259e+02 1.337e+02 1.488e+02 2.205e+02, threshold=2.673e+02, percent-clipped=0.0 2024-09-23 16:31:44,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=299488.0, ans=0.0 2024-09-23 16:31:52,512 INFO [train.py:1198] (3/4) Epoch 17, batch 1850, loss[loss=0.2314, ctc_loss=0.1567, cr_loss=0.3734, over 17041.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1512, cr_loss=0.3674, over 3362245.29 frames. ], batch size: 56, lr: 7.09e-03, grad_scale: 32.0 2024-09-23 16:31:58,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-09-23 16:32:09,060 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-09-23 16:32:25,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=299628.0, ans=0.1 2024-09-23 16:32:44,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=299674.6666666667, ans=0.125 2024-09-23 16:33:05,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=299721.3333333333, ans=0.125 2024-09-23 16:33:14,619 INFO [train.py:1198] (3/4) Epoch 17, batch 1900, loss[loss=0.2159, ctc_loss=0.1443, cr_loss=0.3583, over 16968.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1506, cr_loss=0.3666, over 3371045.07 frames. ], batch size: 42, lr: 7.08e-03, grad_scale: 32.0 2024-09-23 16:33:14,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=299768.0, ans=0.0 2024-09-23 16:33:18,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=299768.0, ans=0.0 2024-09-23 16:34:02,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=299908.0, ans=0.125 2024-09-23 16:34:06,714 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.256e+02 1.309e+02 1.429e+02 1.873e+02, threshold=2.618e+02, percent-clipped=0.0 2024-09-23 16:34:16,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=299954.6666666667, ans=0.2 2024-09-23 16:34:20,083 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.20 vs. limit=15.0 2024-09-23 16:34:24,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=299954.6666666667, ans=0.2 2024-09-23 16:34:33,603 INFO [train.py:1198] (3/4) Epoch 17, batch 1950, loss[loss=0.2549, ctc_loss=0.1764, cr_loss=0.3923, over 17266.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1502, cr_loss=0.3653, over 3365619.82 frames. ], batch size: 55, lr: 7.08e-03, grad_scale: 16.0 2024-09-23 16:34:43,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=300001.3333333333, ans=0.125 2024-09-23 16:35:00,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=300048.0, ans=0.125 2024-09-23 16:35:58,738 INFO [train.py:1198] (3/4) Epoch 17, batch 2000, loss[loss=0.2392, ctc_loss=0.1586, cr_loss=0.4032, over 17117.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1497, cr_loss=0.3644, over 3364499.68 frames. ], batch size: 49, lr: 7.08e-03, grad_scale: 32.0 2024-09-23 16:36:00,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=300234.6666666667, ans=0.125 2024-09-23 16:36:05,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=300234.6666666667, ans=0.125 2024-09-23 16:36:28,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=300281.3333333333, ans=0.125 2024-09-23 16:36:30,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=300281.3333333333, ans=0.2 2024-09-23 16:36:33,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=300328.0, ans=0.125 2024-09-23 16:36:38,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2024-09-23 16:36:55,630 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.280e+02 1.363e+02 1.514e+02 2.601e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-23 16:36:59,220 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 16:37:15,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=22.5 2024-09-23 16:37:21,174 INFO [train.py:1198] (3/4) Epoch 17, batch 2050, loss[loss=0.2388, ctc_loss=0.1623, cr_loss=0.3822, over 17203.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.15, cr_loss=0.3647, over 3357725.31 frames. ], batch size: 47, lr: 7.08e-03, grad_scale: 32.0 2024-09-23 16:37:47,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=300514.6666666667, ans=0.07 2024-09-23 16:37:57,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=300561.3333333333, ans=0.0 2024-09-23 16:38:29,302 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=22.5 2024-09-23 16:38:30,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=300654.6666666667, ans=0.1 2024-09-23 16:38:33,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=300654.6666666667, ans=0.2 2024-09-23 16:38:38,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=300654.6666666667, ans=0.125 2024-09-23 16:38:43,058 INFO [train.py:1198] (3/4) Epoch 17, batch 2100, loss[loss=0.2364, ctc_loss=0.1548, cr_loss=0.4079, over 16999.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1497, cr_loss=0.3643, over 3359224.43 frames. ], batch size: 53, lr: 7.07e-03, grad_scale: 32.0 2024-09-23 16:38:45,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=300701.3333333333, ans=0.0 2024-09-23 16:38:45,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=12.0 2024-09-23 16:38:56,691 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-09-23 16:39:34,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=300841.3333333333, ans=0.1 2024-09-23 16:39:37,336 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.012e+02 1.282e+02 1.378e+02 1.629e+02 2.500e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-23 16:40:05,320 INFO [train.py:1198] (3/4) Epoch 17, batch 2150, loss[loss=0.1993, ctc_loss=0.1309, cr_loss=0.3421, over 17089.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1496, cr_loss=0.3652, over 3361115.95 frames. ], batch size: 43, lr: 7.07e-03, grad_scale: 32.0 2024-09-23 16:40:10,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=300934.6666666667, ans=0.125 2024-09-23 16:40:27,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=300981.3333333333, ans=0.125 2024-09-23 16:40:56,465 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2024-09-23 16:40:57,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=301074.6666666667, ans=0.2 2024-09-23 16:41:25,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=301121.3333333333, ans=0.125 2024-09-23 16:41:28,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=301168.0, ans=0.125 2024-09-23 16:41:29,569 INFO [train.py:1198] (3/4) Epoch 17, batch 2200, loss[loss=0.2246, ctc_loss=0.1492, cr_loss=0.3771, over 17001.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1488, cr_loss=0.364, over 3360884.53 frames. ], batch size: 44, lr: 7.07e-03, grad_scale: 32.0 2024-09-23 16:41:36,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=301168.0, ans=0.125 2024-09-23 16:41:36,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=301168.0, ans=0.5 2024-09-23 16:41:50,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=301214.6666666667, ans=0.2 2024-09-23 16:42:23,223 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.227e+02 1.315e+02 1.424e+02 2.310e+02, threshold=2.629e+02, percent-clipped=0.0 2024-09-23 16:42:26,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=301308.0, ans=0.125 2024-09-23 16:42:46,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=301354.6666666667, ans=0.125 2024-09-23 16:42:51,586 INFO [train.py:1198] (3/4) Epoch 17, batch 2250, loss[loss=0.1955, ctc_loss=0.1335, cr_loss=0.3098, over 16961.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.149, cr_loss=0.3635, over 3354416.23 frames. ], batch size: 42, lr: 7.07e-03, grad_scale: 32.0 2024-09-23 16:42:55,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=301401.3333333333, ans=0.1 2024-09-23 16:43:01,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301401.3333333333, ans=0.1 2024-09-23 16:43:09,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=301448.0, ans=0.1 2024-09-23 16:43:17,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=301448.0, ans=0.125 2024-09-23 16:43:20,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=301448.0, ans=0.025 2024-09-23 16:43:22,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=301494.6666666667, ans=0.125 2024-09-23 16:43:36,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=301494.6666666667, ans=0.0 2024-09-23 16:44:11,217 INFO [train.py:1198] (3/4) Epoch 17, batch 2300, loss[loss=0.2238, ctc_loss=0.1506, cr_loss=0.3662, over 17164.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1497, cr_loss=0.3649, over 3350127.32 frames. ], batch size: 45, lr: 7.06e-03, grad_scale: 32.0 2024-09-23 16:44:43,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=301728.0, ans=0.5 2024-09-23 16:44:48,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=301728.0, ans=0.125 2024-09-23 16:45:01,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=301774.6666666667, ans=0.0 2024-09-23 16:45:08,138 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.278e+02 1.376e+02 1.551e+02 2.468e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-23 16:45:13,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=301774.6666666667, ans=0.0 2024-09-23 16:45:35,977 INFO [train.py:1198] (3/4) Epoch 17, batch 2350, loss[loss=0.2358, ctc_loss=0.1616, cr_loss=0.3711, over 17011.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1497, cr_loss=0.3654, over 3354285.52 frames. ], batch size: 44, lr: 7.06e-03, grad_scale: 32.0 2024-09-23 16:45:42,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=301868.0, ans=0.125 2024-09-23 16:45:44,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=301868.0, ans=0.125 2024-09-23 16:46:43,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=302054.6666666667, ans=0.125 2024-09-23 16:46:57,673 INFO [train.py:1198] (3/4) Epoch 17, batch 2400, loss[loss=0.2704, ctc_loss=0.1889, cr_loss=0.4076, over 15358.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.151, cr_loss=0.3674, over 3354166.51 frames. ], batch size: 89, lr: 7.06e-03, grad_scale: 32.0 2024-09-23 16:47:17,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.30 vs. limit=22.5 2024-09-23 16:47:30,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=8.0 2024-09-23 16:47:53,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=302241.3333333333, ans=0.025 2024-09-23 16:47:54,481 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.300e+02 1.437e+02 1.590e+02 2.245e+02, threshold=2.874e+02, percent-clipped=0.0 2024-09-23 16:47:54,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=302241.3333333333, ans=0.125 2024-09-23 16:48:05,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=302288.0, ans=0.0 2024-09-23 16:48:19,882 INFO [train.py:1198] (3/4) Epoch 17, batch 2450, loss[loss=0.2024, ctc_loss=0.1341, cr_loss=0.3414, over 17150.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1506, cr_loss=0.3665, over 3356197.46 frames. ], batch size: 45, lr: 7.05e-03, grad_scale: 32.0 2024-09-23 16:48:45,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2024-09-23 16:49:07,037 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=12.0 2024-09-23 16:49:19,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302474.6666666667, ans=0.1 2024-09-23 16:49:19,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=302474.6666666667, ans=0.1 2024-09-23 16:49:30,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302521.3333333333, ans=0.1 2024-09-23 16:49:36,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2024-09-23 16:49:39,968 INFO [train.py:1198] (3/4) Epoch 17, batch 2500, loss[loss=0.2455, ctc_loss=0.1672, cr_loss=0.3916, over 16663.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1507, cr_loss=0.3669, over 3363487.16 frames. ], batch size: 61, lr: 7.05e-03, grad_scale: 32.0 2024-09-23 16:49:40,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=302568.0, ans=0.125 2024-09-23 16:50:08,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=302614.6666666667, ans=0.125 2024-09-23 16:50:19,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=302661.3333333333, ans=0.09899494936611666 2024-09-23 16:50:29,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.30 vs. limit=6.0 2024-09-23 16:50:36,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=302708.0, ans=0.125 2024-09-23 16:50:39,595 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.277e+02 1.416e+02 1.598e+02 3.065e+02, threshold=2.832e+02, percent-clipped=1.0 2024-09-23 16:50:43,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=302708.0, ans=0.125 2024-09-23 16:50:55,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=302754.6666666667, ans=0.0 2024-09-23 16:51:04,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=302754.6666666667, ans=0.125 2024-09-23 16:51:07,613 INFO [train.py:1198] (3/4) Epoch 17, batch 2550, loss[loss=0.1897, ctc_loss=0.1247, cr_loss=0.3247, over 16772.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1512, cr_loss=0.3673, over 3364880.14 frames. ], batch size: 37, lr: 7.05e-03, grad_scale: 32.0 2024-09-23 16:51:43,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=302894.6666666667, ans=0.04949747468305833 2024-09-23 16:52:09,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=302988.0, ans=0.025 2024-09-23 16:52:13,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.17 vs. limit=15.0 2024-09-23 16:52:29,692 INFO [train.py:1198] (3/4) Epoch 17, batch 2600, loss[loss=0.2159, ctc_loss=0.1443, cr_loss=0.3583, over 17259.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1509, cr_loss=0.3671, over 3367049.75 frames. ], batch size: 44, lr: 7.05e-03, grad_scale: 32.0 2024-09-23 16:52:52,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=303081.3333333333, ans=0.125 2024-09-23 16:53:23,708 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.309e+02 1.432e+02 1.509e+02 2.078e+02, threshold=2.863e+02, percent-clipped=0.0 2024-09-23 16:53:38,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=303221.3333333333, ans=0.125 2024-09-23 16:53:44,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=303221.3333333333, ans=0.1 2024-09-23 16:53:48,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=303268.0, ans=0.05 2024-09-23 16:53:49,178 INFO [train.py:1198] (3/4) Epoch 17, batch 2650, loss[loss=0.2287, ctc_loss=0.1564, cr_loss=0.3615, over 17156.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.151, cr_loss=0.3679, over 3364648.82 frames. ], batch size: 48, lr: 7.04e-03, grad_scale: 32.0 2024-09-23 16:53:49,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=303268.0, ans=0.125 2024-09-23 16:54:24,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=303361.3333333333, ans=0.0 2024-09-23 16:54:54,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=303454.6666666667, ans=0.125 2024-09-23 16:54:59,285 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=22.5 2024-09-23 16:55:09,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-09-23 16:55:13,735 INFO [train.py:1198] (3/4) Epoch 17, batch 2700, loss[loss=0.2404, ctc_loss=0.1608, cr_loss=0.398, over 17186.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1513, cr_loss=0.3676, over 3353513.29 frames. ], batch size: 45, lr: 7.04e-03, grad_scale: 32.0 2024-09-23 16:55:27,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.29 vs. limit=15.0 2024-09-23 16:55:36,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=303548.0, ans=0.1 2024-09-23 16:55:42,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=303548.0, ans=0.125 2024-09-23 16:56:10,278 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.299e+02 1.395e+02 1.578e+02 3.213e+02, threshold=2.790e+02, percent-clipped=1.0 2024-09-23 16:56:13,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=303641.3333333333, ans=0.035 2024-09-23 16:56:20,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=303688.0, ans=0.125 2024-09-23 16:56:35,878 INFO [train.py:1198] (3/4) Epoch 17, batch 2750, loss[loss=0.2301, ctc_loss=0.155, cr_loss=0.3755, over 17008.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1511, cr_loss=0.3674, over 3355713.66 frames. ], batch size: 51, lr: 7.04e-03, grad_scale: 32.0 2024-09-23 16:56:50,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=303781.3333333333, ans=0.0 2024-09-23 16:57:00,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=303781.3333333333, ans=0.0 2024-09-23 16:57:10,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.12 vs. limit=12.0 2024-09-23 16:57:47,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=303921.3333333333, ans=0.1 2024-09-23 16:57:55,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=303921.3333333333, ans=0.2 2024-09-23 16:57:58,358 INFO [train.py:1198] (3/4) Epoch 17, batch 2800, loss[loss=0.2166, ctc_loss=0.1428, cr_loss=0.3689, over 17014.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1506, cr_loss=0.3673, over 3362584.22 frames. ], batch size: 51, lr: 7.04e-03, grad_scale: 32.0 2024-09-23 16:58:05,114 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-09-23 16:58:50,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=304108.0, ans=0.125 2024-09-23 16:58:53,781 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.311e+02 1.385e+02 1.466e+02 2.534e+02, threshold=2.770e+02, percent-clipped=0.0 2024-09-23 16:58:54,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=304108.0, ans=0.0 2024-09-23 16:59:02,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.09 vs. limit=15.0 2024-09-23 16:59:05,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=304154.6666666667, ans=0.2 2024-09-23 16:59:17,883 INFO [train.py:1198] (3/4) Epoch 17, batch 2850, loss[loss=0.2602, ctc_loss=0.1794, cr_loss=0.4041, over 15846.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1499, cr_loss=0.3667, over 3369809.13 frames. ], batch size: 74, lr: 7.03e-03, grad_scale: 16.0 2024-09-23 16:59:22,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-09-23 16:59:29,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.59 vs. limit=10.0 2024-09-23 16:59:46,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=304248.0, ans=0.2 2024-09-23 17:00:22,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=304341.3333333333, ans=0.0 2024-09-23 17:00:27,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=304388.0, ans=0.0 2024-09-23 17:00:31,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2024-09-23 17:00:42,928 INFO [train.py:1198] (3/4) Epoch 17, batch 2900, loss[loss=0.2435, ctc_loss=0.1653, cr_loss=0.3911, over 16861.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1492, cr_loss=0.3654, over 3375892.91 frames. ], batch size: 58, lr: 7.03e-03, grad_scale: 16.0 2024-09-23 17:00:52,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=304434.6666666667, ans=0.125 2024-09-23 17:01:01,856 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 17:01:04,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=304481.3333333333, ans=0.2 2024-09-23 17:01:09,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=304481.3333333333, ans=0.025 2024-09-23 17:01:14,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304481.3333333333, ans=0.1 2024-09-23 17:01:15,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304528.0, ans=0.1 2024-09-23 17:01:28,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=304528.0, ans=0.0 2024-09-23 17:01:38,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=304574.6666666667, ans=0.0 2024-09-23 17:01:42,987 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.263e+02 1.350e+02 1.442e+02 2.620e+02, threshold=2.699e+02, percent-clipped=0.0 2024-09-23 17:01:59,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=304621.3333333333, ans=0.125 2024-09-23 17:02:00,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=304621.3333333333, ans=0.1 2024-09-23 17:02:02,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=304621.3333333333, ans=0.125 2024-09-23 17:02:05,521 INFO [train.py:1198] (3/4) Epoch 17, batch 2950, loss[loss=0.1713, ctc_loss=0.1121, cr_loss=0.2962, over 17191.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1498, cr_loss=0.3657, over 3367169.70 frames. ], batch size: 41, lr: 7.03e-03, grad_scale: 8.0 2024-09-23 17:02:07,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=304668.0, ans=0.0 2024-09-23 17:02:23,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304714.6666666667, ans=0.1 2024-09-23 17:02:24,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=304714.6666666667, ans=0.2 2024-09-23 17:02:58,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=304808.0, ans=0.125 2024-09-23 17:03:11,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=304854.6666666667, ans=0.125 2024-09-23 17:03:26,628 INFO [train.py:1198] (3/4) Epoch 17, batch 3000, loss[loss=0.2233, ctc_loss=0.1495, cr_loss=0.3694, over 17129.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1497, cr_loss=0.3657, over 3373547.79 frames. ], batch size: 48, lr: 7.02e-03, grad_scale: 8.0 2024-09-23 17:03:26,628 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 17:03:37,858 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8172, 4.8392, 5.5745, 5.2735], device='cuda:3') 2024-09-23 17:03:42,424 INFO [train.py:1230] (3/4) Epoch 17, validation: loss=0.0409, ctc_loss=0.0409, cr_loss=7.678e-15, over 944034.00 frames. 2024-09-23 17:03:42,425 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 17:03:45,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=304901.3333333333, ans=0.025 2024-09-23 17:03:47,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=304901.3333333333, ans=0.0 2024-09-23 17:04:12,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.70 vs. limit=22.5 2024-09-23 17:04:13,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=304994.6666666667, ans=0.0 2024-09-23 17:04:38,119 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.276e+02 1.374e+02 1.513e+02 2.906e+02, threshold=2.749e+02, percent-clipped=1.0 2024-09-23 17:04:49,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=305088.0, ans=0.2 2024-09-23 17:04:59,731 INFO [train.py:1198] (3/4) Epoch 17, batch 3050, loss[loss=0.238, ctc_loss=0.1606, cr_loss=0.3871, over 17073.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1509, cr_loss=0.3669, over 3356970.20 frames. ], batch size: 56, lr: 7.02e-03, grad_scale: 8.0 2024-09-23 17:05:04,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=305134.6666666667, ans=0.125 2024-09-23 17:05:12,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=305134.6666666667, ans=0.125 2024-09-23 17:05:28,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=305181.3333333333, ans=0.125 2024-09-23 17:05:39,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=305228.0, ans=0.035 2024-09-23 17:05:44,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=305228.0, ans=0.0 2024-09-23 17:06:06,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=305321.3333333333, ans=0.2 2024-09-23 17:06:15,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.59 vs. limit=15.0 2024-09-23 17:06:20,563 INFO [train.py:1198] (3/4) Epoch 17, batch 3100, loss[loss=0.2235, ctc_loss=0.1502, cr_loss=0.3664, over 17344.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1507, cr_loss=0.3674, over 3358004.96 frames. ], batch size: 48, lr: 7.02e-03, grad_scale: 8.0 2024-09-23 17:06:37,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2024-09-23 17:06:57,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=305461.3333333333, ans=0.125 2024-09-23 17:07:02,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=305461.3333333333, ans=0.1 2024-09-23 17:07:07,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.83 vs. limit=12.0 2024-09-23 17:07:11,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=305508.0, ans=0.125 2024-09-23 17:07:16,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=305508.0, ans=0.125 2024-09-23 17:07:19,081 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.257e+02 1.347e+02 1.447e+02 2.070e+02, threshold=2.694e+02, percent-clipped=0.0 2024-09-23 17:07:20,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=305508.0, ans=0.0 2024-09-23 17:07:29,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-09-23 17:07:31,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=305554.6666666667, ans=0.1 2024-09-23 17:07:41,116 INFO [train.py:1198] (3/4) Epoch 17, batch 3150, loss[loss=0.241, ctc_loss=0.1607, cr_loss=0.4018, over 17200.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1512, cr_loss=0.3681, over 3351517.35 frames. ], batch size: 55, lr: 7.02e-03, grad_scale: 8.0 2024-09-23 17:07:53,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-23 17:07:57,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=305648.0, ans=0.125 2024-09-23 17:08:10,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=305694.6666666667, ans=0.1 2024-09-23 17:08:30,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=305741.3333333333, ans=0.125 2024-09-23 17:09:00,535 INFO [train.py:1198] (3/4) Epoch 17, batch 3200, loss[loss=0.2237, ctc_loss=0.1488, cr_loss=0.3744, over 17005.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.15, cr_loss=0.3664, over 3360963.35 frames. ], batch size: 56, lr: 7.01e-03, grad_scale: 16.0 2024-09-23 17:09:56,502 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.224e+02 1.300e+02 1.391e+02 2.057e+02, threshold=2.599e+02, percent-clipped=0.0 2024-09-23 17:10:10,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=306021.3333333333, ans=0.125 2024-09-23 17:10:18,314 INFO [train.py:1198] (3/4) Epoch 17, batch 3250, loss[loss=0.1846, ctc_loss=0.1225, cr_loss=0.3106, over 16957.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1494, cr_loss=0.3648, over 3361768.89 frames. ], batch size: 42, lr: 7.01e-03, grad_scale: 16.0 2024-09-23 17:10:34,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306114.6666666667, ans=0.1 2024-09-23 17:10:40,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=306114.6666666667, ans=0.1 2024-09-23 17:10:54,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306161.3333333333, ans=0.1 2024-09-23 17:11:07,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.96 vs. limit=10.0 2024-09-23 17:11:13,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=306208.0, ans=0.125 2024-09-23 17:11:36,837 INFO [train.py:1198] (3/4) Epoch 17, batch 3300, loss[loss=0.2409, ctc_loss=0.1613, cr_loss=0.3979, over 17024.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1489, cr_loss=0.3639, over 3352041.03 frames. ], batch size: 51, lr: 7.01e-03, grad_scale: 16.0 2024-09-23 17:11:45,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=306301.3333333333, ans=0.0 2024-09-23 17:12:05,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=306348.0, ans=0.2 2024-09-23 17:12:32,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=306441.3333333333, ans=0.025 2024-09-23 17:12:34,842 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.271e+02 1.388e+02 1.557e+02 2.598e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-23 17:12:51,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=15.0 2024-09-23 17:12:56,559 INFO [train.py:1198] (3/4) Epoch 17, batch 3350, loss[loss=0.2294, ctc_loss=0.1503, cr_loss=0.3954, over 17262.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1494, cr_loss=0.3646, over 3350679.14 frames. ], batch size: 44, lr: 7.01e-03, grad_scale: 16.0 2024-09-23 17:13:01,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=306534.6666666667, ans=0.125 2024-09-23 17:13:35,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=306628.0, ans=0.0 2024-09-23 17:13:41,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.66 vs. limit=10.0 2024-09-23 17:14:03,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=12.0 2024-09-23 17:14:14,501 INFO [train.py:1198] (3/4) Epoch 17, batch 3400, loss[loss=0.2387, ctc_loss=0.163, cr_loss=0.3783, over 17033.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1494, cr_loss=0.3643, over 3357408.72 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 16.0 2024-09-23 17:14:23,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306768.0, ans=0.1 2024-09-23 17:14:50,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=306861.3333333333, ans=0.125 2024-09-23 17:14:56,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=306861.3333333333, ans=0.125 2024-09-23 17:15:09,980 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.230e+02 1.305e+02 1.432e+02 2.019e+02, threshold=2.610e+02, percent-clipped=0.0 2024-09-23 17:15:11,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=306908.0, ans=0.125 2024-09-23 17:15:13,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=306908.0, ans=0.125 2024-09-23 17:15:29,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=306954.6666666667, ans=0.0 2024-09-23 17:15:32,032 INFO [train.py:1198] (3/4) Epoch 17, batch 3450, loss[loss=0.2019, ctc_loss=0.1338, cr_loss=0.3404, over 17022.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1487, cr_loss=0.3632, over 3357819.77 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 16.0 2024-09-23 17:15:34,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.73 vs. limit=15.0 2024-09-23 17:15:43,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=307001.3333333333, ans=0.125 2024-09-23 17:15:46,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=307048.0, ans=0.125 2024-09-23 17:16:18,044 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.88 vs. limit=12.0 2024-09-23 17:16:33,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=307141.3333333333, ans=0.0 2024-09-23 17:16:44,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=307188.0, ans=0.125 2024-09-23 17:16:49,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=307188.0, ans=0.0 2024-09-23 17:16:53,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.89 vs. limit=10.0 2024-09-23 17:16:53,996 INFO [train.py:1198] (3/4) Epoch 17, batch 3500, loss[loss=0.2265, ctc_loss=0.153, cr_loss=0.3673, over 17032.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1491, cr_loss=0.3635, over 3345257.13 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 16.0 2024-09-23 17:17:11,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=307281.3333333333, ans=0.125 2024-09-23 17:17:28,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=307328.0, ans=0.125 2024-09-23 17:17:34,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=307328.0, ans=0.025 2024-09-23 17:17:36,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2024-09-23 17:17:44,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=307374.6666666667, ans=0.2 2024-09-23 17:17:50,325 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.294e+02 1.368e+02 1.478e+02 3.708e+02, threshold=2.737e+02, percent-clipped=1.0 2024-09-23 17:18:04,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307421.3333333333, ans=0.1 2024-09-23 17:18:12,055 INFO [train.py:1198] (3/4) Epoch 17, batch 3550, loss[loss=0.2174, ctc_loss=0.1432, cr_loss=0.3706, over 17126.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1501, cr_loss=0.3652, over 3342726.51 frames. ], batch size: 40, lr: 7.00e-03, grad_scale: 16.0 2024-09-23 17:18:21,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=307468.0, ans=0.2 2024-09-23 17:18:21,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=307468.0, ans=0.1 2024-09-23 17:18:42,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=307514.6666666667, ans=0.125 2024-09-23 17:19:00,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=307608.0, ans=0.1 2024-09-23 17:19:04,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2024-09-23 17:19:31,531 INFO [train.py:1198] (3/4) Epoch 17, batch 3600, loss[loss=0.2674, ctc_loss=0.1833, cr_loss=0.4209, over 16501.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1497, cr_loss=0.3643, over 3343028.89 frames. ], batch size: 66, lr: 6.99e-03, grad_scale: 16.0 2024-09-23 17:19:36,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2024-09-23 17:19:47,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=307748.0, ans=0.125 2024-09-23 17:19:48,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=307748.0, ans=0.125 2024-09-23 17:19:57,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2024-09-23 17:20:07,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=307794.6666666667, ans=0.125 2024-09-23 17:20:09,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=307794.6666666667, ans=0.125 2024-09-23 17:20:12,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=307794.6666666667, ans=0.05 2024-09-23 17:20:29,251 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.229e+02 1.313e+02 1.436e+02 1.870e+02, threshold=2.625e+02, percent-clipped=0.0 2024-09-23 17:20:39,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=22.5 2024-09-23 17:20:43,815 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2024-09-23 17:20:49,541 INFO [train.py:1198] (3/4) Epoch 17, batch 3650, loss[loss=0.2117, ctc_loss=0.1403, cr_loss=0.357, over 16937.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1503, cr_loss=0.3651, over 3341235.15 frames. ], batch size: 42, lr: 6.99e-03, grad_scale: 16.0 2024-09-23 17:20:49,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=307934.6666666667, ans=0.1 2024-09-23 17:20:51,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307934.6666666667, ans=0.1 2024-09-23 17:20:57,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=307934.6666666667, ans=0.0 2024-09-23 17:21:12,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=307981.3333333333, ans=0.2 2024-09-23 17:21:12,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=307981.3333333333, ans=0.025 2024-09-23 17:21:19,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.20 vs. limit=22.5 2024-09-23 17:21:44,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.49 vs. limit=22.5 2024-09-23 17:21:47,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308074.6666666667, ans=0.1 2024-09-23 17:21:55,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=308121.3333333333, ans=0.1 2024-09-23 17:21:55,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=308121.3333333333, ans=0.125 2024-09-23 17:22:03,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=308121.3333333333, ans=0.125 2024-09-23 17:22:07,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=308121.3333333333, ans=0.09899494936611666 2024-09-23 17:22:10,731 INFO [train.py:1198] (3/4) Epoch 17, batch 3700, loss[loss=0.2368, ctc_loss=0.1585, cr_loss=0.3915, over 17008.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1501, cr_loss=0.365, over 3345264.56 frames. ], batch size: 56, lr: 6.99e-03, grad_scale: 16.0 2024-09-23 17:22:26,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=308214.6666666667, ans=0.125 2024-09-23 17:22:44,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2024-09-23 17:22:48,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=308261.3333333333, ans=0.125 2024-09-23 17:22:53,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.86 vs. limit=22.5 2024-09-23 17:23:01,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=308308.0, ans=0.125 2024-09-23 17:23:08,759 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.332e+02 1.441e+02 1.620e+02 2.318e+02, threshold=2.882e+02, percent-clipped=0.0 2024-09-23 17:23:17,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.51 vs. limit=12.0 2024-09-23 17:23:28,819 INFO [train.py:1198] (3/4) Epoch 17, batch 3750, loss[loss=0.2314, ctc_loss=0.1557, cr_loss=0.3783, over 17293.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1502, cr_loss=0.3648, over 3348445.72 frames. ], batch size: 51, lr: 6.99e-03, grad_scale: 16.0 2024-09-23 17:23:29,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=308401.3333333333, ans=0.1 2024-09-23 17:23:46,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=308448.0, ans=0.0 2024-09-23 17:23:55,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=308448.0, ans=0.025 2024-09-23 17:24:06,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=308494.6666666667, ans=0.125 2024-09-23 17:24:15,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.08 vs. limit=15.0 2024-09-23 17:24:22,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=308541.3333333333, ans=0.1 2024-09-23 17:24:24,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=308541.3333333333, ans=0.0 2024-09-23 17:24:25,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=308541.3333333333, ans=0.0 2024-09-23 17:24:30,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.56 vs. limit=15.0 2024-09-23 17:24:39,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=308588.0, ans=0.125 2024-09-23 17:24:47,307 INFO [train.py:1198] (3/4) Epoch 17, batch 3800, loss[loss=0.2746, ctc_loss=0.1887, cr_loss=0.4293, over 15195.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1522, cr_loss=0.3674, over 3345442.31 frames. ], batch size: 89, lr: 6.98e-03, grad_scale: 16.0 2024-09-23 17:24:52,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.68 vs. limit=22.5 2024-09-23 17:24:56,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2024-09-23 17:25:00,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=308634.6666666667, ans=0.1 2024-09-23 17:25:01,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=308681.3333333333, ans=0.125 2024-09-23 17:25:18,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2024-09-23 17:25:25,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=308728.0, ans=0.125 2024-09-23 17:25:39,206 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.36 vs. limit=15.0 2024-09-23 17:25:43,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=308774.6666666667, ans=22.5 2024-09-23 17:25:45,753 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.280e+02 1.358e+02 1.516e+02 2.710e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-23 17:26:05,960 INFO [train.py:1198] (3/4) Epoch 17, batch 3850, loss[loss=0.1981, ctc_loss=0.1351, cr_loss=0.3146, over 16382.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1521, cr_loss=0.3671, over 3323090.14 frames. ], batch size: 36, lr: 6.98e-03, grad_scale: 16.0 2024-09-23 17:26:21,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=308914.6666666667, ans=0.2 2024-09-23 17:26:40,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=308961.3333333333, ans=0.1 2024-09-23 17:27:08,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=309054.6666666667, ans=0.0 2024-09-23 17:27:09,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.28 vs. limit=15.0 2024-09-23 17:28:06,806 INFO [train.py:1198] (3/4) Epoch 18, batch 0, loss[loss=0.2596, ctc_loss=0.175, cr_loss=0.423, over 17028.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.175, cr_loss=0.423, over 17028.00 frames. ], batch size: 52, lr: 6.78e-03, grad_scale: 32.0 2024-09-23 17:28:06,807 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 17:28:21,931 INFO [train.py:1230] (3/4) Epoch 18, validation: loss=0.03994, ctc_loss=0.03994, cr_loss=8.27e-15, over 944034.00 frames. 2024-09-23 17:28:21,931 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 17:28:27,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=309082.6666666667, ans=0.125 2024-09-23 17:28:49,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=309129.3333333333, ans=0.125 2024-09-23 17:29:30,231 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.283e+02 1.487e+02 1.642e+02 2.774e+02, threshold=2.974e+02, percent-clipped=1.0 2024-09-23 17:29:30,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2024-09-23 17:29:44,551 INFO [train.py:1198] (3/4) Epoch 18, batch 50, loss[loss=0.2047, ctc_loss=0.1379, cr_loss=0.3336, over 17067.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1468, cr_loss=0.3561, over 744710.20 frames. ], batch size: 46, lr: 6.78e-03, grad_scale: 32.0 2024-09-23 17:29:51,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=309316.0, ans=0.0 2024-09-23 17:29:57,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=309316.0, ans=0.125 2024-09-23 17:31:06,897 INFO [train.py:1198] (3/4) Epoch 18, batch 100, loss[loss=0.1989, ctc_loss=0.1287, cr_loss=0.3512, over 17280.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1483, cr_loss=0.363, over 1330270.56 frames. ], batch size: 42, lr: 6.77e-03, grad_scale: 16.0 2024-09-23 17:31:07,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.54 vs. limit=10.0 2024-09-23 17:31:13,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=309549.3333333333, ans=0.0 2024-09-23 17:31:17,293 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2024-09-23 17:31:50,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.97 vs. limit=15.0 2024-09-23 17:31:51,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=309642.6666666667, ans=0.2 2024-09-23 17:32:13,774 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.252e+02 1.337e+02 1.407e+02 3.310e+02, threshold=2.674e+02, percent-clipped=1.0 2024-09-23 17:32:14,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=309736.0, ans=0.0 2024-09-23 17:32:28,258 INFO [train.py:1198] (3/4) Epoch 18, batch 150, loss[loss=0.2084, ctc_loss=0.138, cr_loss=0.3521, over 17035.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1491, cr_loss=0.3628, over 1780999.03 frames. ], batch size: 52, lr: 6.77e-03, grad_scale: 16.0 2024-09-23 17:32:30,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=309782.6666666667, ans=0.2 2024-09-23 17:32:31,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=309782.6666666667, ans=0.125 2024-09-23 17:32:39,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=309782.6666666667, ans=0.04949747468305833 2024-09-23 17:32:46,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=309829.3333333333, ans=0.2 2024-09-23 17:33:39,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309969.3333333333, ans=0.1 2024-09-23 17:33:44,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=309969.3333333333, ans=0.125 2024-09-23 17:33:50,779 INFO [train.py:1198] (3/4) Epoch 18, batch 200, loss[loss=0.2162, ctc_loss=0.1431, cr_loss=0.3656, over 17016.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1493, cr_loss=0.363, over 2123329.90 frames. ], batch size: 56, lr: 6.77e-03, grad_scale: 16.0 2024-09-23 17:34:24,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=12.0 2024-09-23 17:34:35,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=310109.3333333333, ans=0.0 2024-09-23 17:34:36,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.38 vs. limit=15.0 2024-09-23 17:34:56,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=310202.6666666667, ans=0.125 2024-09-23 17:35:00,727 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.233e+02 1.340e+02 1.521e+02 2.141e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-23 17:35:01,135 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 17:35:13,461 INFO [train.py:1198] (3/4) Epoch 18, batch 250, loss[loss=0.2036, ctc_loss=0.1332, cr_loss=0.3519, over 17306.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.149, cr_loss=0.3633, over 2396332.90 frames. ], batch size: 46, lr: 6.77e-03, grad_scale: 16.0 2024-09-23 17:36:18,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=310436.0, ans=0.125 2024-09-23 17:36:36,381 INFO [train.py:1198] (3/4) Epoch 18, batch 300, loss[loss=0.2031, ctc_loss=0.1326, cr_loss=0.3527, over 17039.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1489, cr_loss=0.3627, over 2597290.40 frames. ], batch size: 44, lr: 6.76e-03, grad_scale: 16.0 2024-09-23 17:37:20,244 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.59 vs. limit=8.0 2024-09-23 17:37:25,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=310622.6666666667, ans=0.0 2024-09-23 17:37:28,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=310622.6666666667, ans=0.125 2024-09-23 17:37:46,270 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.225e+02 1.352e+02 1.557e+02 2.949e+02, threshold=2.705e+02, percent-clipped=1.0 2024-09-23 17:37:59,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=310716.0, ans=0.125 2024-09-23 17:38:01,323 INFO [train.py:1198] (3/4) Epoch 18, batch 350, loss[loss=0.2204, ctc_loss=0.1487, cr_loss=0.3586, over 17052.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1498, cr_loss=0.3644, over 2765450.99 frames. ], batch size: 46, lr: 6.76e-03, grad_scale: 16.0 2024-09-23 17:38:40,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=310809.3333333333, ans=0.1 2024-09-23 17:38:53,159 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 17:38:57,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.18 vs. limit=10.0 2024-09-23 17:38:57,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=310856.0, ans=0.07 2024-09-23 17:39:19,877 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 17:39:24,260 INFO [train.py:1198] (3/4) Epoch 18, batch 400, loss[loss=0.1946, ctc_loss=0.131, cr_loss=0.3181, over 17196.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1499, cr_loss=0.3645, over 2888004.00 frames. ], batch size: 41, lr: 6.76e-03, grad_scale: 32.0 2024-09-23 17:39:24,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=310949.3333333333, ans=0.2 2024-09-23 17:39:38,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=310996.0, ans=0.125 2024-09-23 17:39:43,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=310996.0, ans=0.0 2024-09-23 17:39:56,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=15.0 2024-09-23 17:40:10,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=311089.3333333333, ans=0.125 2024-09-23 17:40:17,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=311089.3333333333, ans=0.125 2024-09-23 17:40:23,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=311089.3333333333, ans=0.2 2024-09-23 17:40:29,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=311136.0, ans=0.025 2024-09-23 17:40:31,100 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.269e+02 1.393e+02 1.575e+02 2.470e+02, threshold=2.786e+02, percent-clipped=0.0 2024-09-23 17:40:43,673 INFO [train.py:1198] (3/4) Epoch 18, batch 450, loss[loss=0.1821, ctc_loss=0.1188, cr_loss=0.3166, over 17107.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1491, cr_loss=0.3638, over 3000491.29 frames. ], batch size: 40, lr: 6.76e-03, grad_scale: 32.0 2024-09-23 17:41:08,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=311229.3333333333, ans=0.125 2024-09-23 17:41:40,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=311322.6666666667, ans=0.125 2024-09-23 17:41:52,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=311369.3333333333, ans=0.0 2024-09-23 17:42:05,316 INFO [train.py:1198] (3/4) Epoch 18, batch 500, loss[loss=0.2266, ctc_loss=0.1501, cr_loss=0.3824, over 17034.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.149, cr_loss=0.3647, over 3091975.20 frames. ], batch size: 44, lr: 6.75e-03, grad_scale: 32.0 2024-09-23 17:42:22,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=311462.6666666667, ans=0.125 2024-09-23 17:42:30,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=311462.6666666667, ans=0.0 2024-09-23 17:43:17,644 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.265e+02 1.370e+02 1.572e+02 2.414e+02, threshold=2.740e+02, percent-clipped=0.0 2024-09-23 17:43:22,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=311602.6666666667, ans=0.2 2024-09-23 17:43:30,321 INFO [train.py:1198] (3/4) Epoch 18, batch 550, loss[loss=0.2179, ctc_loss=0.1428, cr_loss=0.3756, over 17241.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.149, cr_loss=0.3651, over 3151307.03 frames. ], batch size: 44, lr: 6.75e-03, grad_scale: 32.0 2024-09-23 17:43:38,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=311649.3333333333, ans=0.0 2024-09-23 17:43:41,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=311649.3333333333, ans=0.0 2024-09-23 17:43:41,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=311649.3333333333, ans=0.125 2024-09-23 17:43:46,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=311696.0, ans=0.125 2024-09-23 17:44:01,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=311742.6666666667, ans=0.0 2024-09-23 17:44:01,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=311742.6666666667, ans=0.125 2024-09-23 17:44:24,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=311789.3333333333, ans=0.125 2024-09-23 17:44:42,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=22.5 2024-09-23 17:44:53,371 INFO [train.py:1198] (3/4) Epoch 18, batch 600, loss[loss=0.2428, ctc_loss=0.1623, cr_loss=0.4022, over 16704.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1494, cr_loss=0.3658, over 3190266.09 frames. ], batch size: 61, lr: 6.75e-03, grad_scale: 32.0 2024-09-23 17:44:53,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=311882.6666666667, ans=0.2 2024-09-23 17:46:04,639 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.282e+02 1.384e+02 1.538e+02 2.458e+02, threshold=2.768e+02, percent-clipped=0.0 2024-09-23 17:46:06,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=312069.3333333333, ans=0.2 2024-09-23 17:46:16,007 INFO [train.py:1198] (3/4) Epoch 18, batch 650, loss[loss=0.2041, ctc_loss=0.1351, cr_loss=0.3452, over 16923.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1497, cr_loss=0.3661, over 3217250.37 frames. ], batch size: 42, lr: 6.75e-03, grad_scale: 16.0 2024-09-23 17:46:48,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=312209.3333333333, ans=0.07 2024-09-23 17:47:05,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=312256.0, ans=0.125 2024-09-23 17:47:07,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=312256.0, ans=0.125 2024-09-23 17:47:38,727 INFO [train.py:1198] (3/4) Epoch 18, batch 700, loss[loss=0.227, ctc_loss=0.1525, cr_loss=0.3724, over 17123.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.15, cr_loss=0.3663, over 3246224.58 frames. ], batch size: 40, lr: 6.74e-03, grad_scale: 16.0 2024-09-23 17:48:32,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=312489.3333333333, ans=0.125 2024-09-23 17:48:49,978 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.252e+02 1.369e+02 1.593e+02 2.409e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-23 17:48:50,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312536.0, ans=0.1 2024-09-23 17:49:00,934 INFO [train.py:1198] (3/4) Epoch 18, batch 750, loss[loss=0.2235, ctc_loss=0.1473, cr_loss=0.381, over 17233.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.15, cr_loss=0.3663, over 3266576.95 frames. ], batch size: 44, lr: 6.74e-03, grad_scale: 16.0 2024-09-23 17:49:01,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=312582.6666666667, ans=0.2 2024-09-23 17:49:09,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=312582.6666666667, ans=0.2 2024-09-23 17:49:24,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=312629.3333333333, ans=0.125 2024-09-23 17:49:37,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=312676.0, ans=0.0 2024-09-23 17:50:06,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=312769.3333333333, ans=0.05 2024-09-23 17:50:07,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=312769.3333333333, ans=0.125 2024-09-23 17:50:23,308 INFO [train.py:1198] (3/4) Epoch 18, batch 800, loss[loss=0.2269, ctc_loss=0.151, cr_loss=0.3798, over 17223.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.15, cr_loss=0.3666, over 3283027.10 frames. ], batch size: 47, lr: 6.74e-03, grad_scale: 32.0 2024-09-23 17:50:28,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312816.0, ans=0.1 2024-09-23 17:50:41,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312862.6666666667, ans=0.1 2024-09-23 17:51:21,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=312956.0, ans=0.2 2024-09-23 17:51:30,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=313002.6666666667, ans=0.2 2024-09-23 17:51:34,556 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.268e+02 1.368e+02 1.483e+02 2.318e+02, threshold=2.737e+02, percent-clipped=0.0 2024-09-23 17:51:39,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=313002.6666666667, ans=0.2 2024-09-23 17:51:44,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=313049.3333333333, ans=0.05 2024-09-23 17:51:45,657 INFO [train.py:1198] (3/4) Epoch 18, batch 850, loss[loss=0.219, ctc_loss=0.1432, cr_loss=0.3791, over 17156.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1491, cr_loss=0.3652, over 3301773.42 frames. ], batch size: 45, lr: 6.74e-03, grad_scale: 32.0 2024-09-23 17:52:07,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=313096.0, ans=0.0 2024-09-23 17:52:38,736 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.24 vs. limit=12.0 2024-09-23 17:52:48,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=313189.3333333333, ans=0.125 2024-09-23 17:52:54,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=313236.0, ans=0.2 2024-09-23 17:52:54,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=313236.0, ans=0.1 2024-09-23 17:53:04,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.04 vs. limit=22.5 2024-09-23 17:53:10,237 INFO [train.py:1198] (3/4) Epoch 18, batch 900, loss[loss=0.2123, ctc_loss=0.1435, cr_loss=0.3442, over 17158.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1489, cr_loss=0.3643, over 3314400.45 frames. ], batch size: 48, lr: 6.73e-03, grad_scale: 32.0 2024-09-23 17:53:37,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=313329.3333333333, ans=0.125 2024-09-23 17:54:07,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=313422.6666666667, ans=0.1 2024-09-23 17:54:21,420 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.286e+02 1.405e+02 1.600e+02 2.669e+02, threshold=2.810e+02, percent-clipped=0.0 2024-09-23 17:54:28,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=313469.3333333333, ans=0.0 2024-09-23 17:54:32,764 INFO [train.py:1198] (3/4) Epoch 18, batch 950, loss[loss=0.2211, ctc_loss=0.1489, cr_loss=0.3613, over 17303.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1486, cr_loss=0.3635, over 3317586.96 frames. ], batch size: 46, lr: 6.73e-03, grad_scale: 32.0 2024-09-23 17:54:52,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=12.0 2024-09-23 17:54:57,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=313562.6666666667, ans=0.1 2024-09-23 17:54:58,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=313562.6666666667, ans=0.0 2024-09-23 17:55:07,444 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-09-23 17:55:55,789 INFO [train.py:1198] (3/4) Epoch 18, batch 1000, loss[loss=0.198, ctc_loss=0.1311, cr_loss=0.3345, over 17186.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1474, cr_loss=0.3618, over 3326567.31 frames. ], batch size: 41, lr: 6.73e-03, grad_scale: 32.0 2024-09-23 17:57:06,775 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.230e+02 1.328e+02 1.431e+02 2.241e+02, threshold=2.657e+02, percent-clipped=0.0 2024-09-23 17:57:10,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=313936.0, ans=0.1 2024-09-23 17:57:17,824 INFO [train.py:1198] (3/4) Epoch 18, batch 1050, loss[loss=0.2228, ctc_loss=0.1574, cr_loss=0.3274, over 16000.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1473, cr_loss=0.3623, over 3339377.93 frames. ], batch size: 74, lr: 6.73e-03, grad_scale: 32.0 2024-09-23 17:57:22,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=313982.6666666667, ans=0.0 2024-09-23 17:57:27,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=313982.6666666667, ans=0.0 2024-09-23 17:57:48,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-09-23 17:58:09,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=314122.6666666667, ans=0.0 2024-09-23 17:58:17,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=314122.6666666667, ans=0.1 2024-09-23 17:58:30,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=314169.3333333333, ans=0.0 2024-09-23 17:58:40,063 INFO [train.py:1198] (3/4) Epoch 18, batch 1100, loss[loss=0.2212, ctc_loss=0.1485, cr_loss=0.3638, over 17313.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1465, cr_loss=0.3617, over 3352655.97 frames. ], batch size: 46, lr: 6.72e-03, grad_scale: 32.0 2024-09-23 17:58:56,592 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=22.5 2024-09-23 17:59:24,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=314309.3333333333, ans=0.125 2024-09-23 17:59:41,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=314356.0, ans=0.1 2024-09-23 17:59:48,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=314402.6666666667, ans=0.0 2024-09-23 17:59:51,145 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.280e+02 1.383e+02 1.491e+02 2.284e+02, threshold=2.766e+02, percent-clipped=0.0 2024-09-23 18:00:02,378 INFO [train.py:1198] (3/4) Epoch 18, batch 1150, loss[loss=0.2674, ctc_loss=0.183, cr_loss=0.4221, over 16592.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.147, cr_loss=0.362, over 3350430.79 frames. ], batch size: 66, lr: 6.72e-03, grad_scale: 32.0 2024-09-23 18:00:30,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2024-09-23 18:01:16,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=314636.0, ans=0.07 2024-09-23 18:01:19,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=314636.0, ans=0.125 2024-09-23 18:01:22,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.54 vs. limit=22.5 2024-09-23 18:01:25,141 INFO [train.py:1198] (3/4) Epoch 18, batch 1200, loss[loss=0.1958, ctc_loss=0.1286, cr_loss=0.3363, over 17087.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1468, cr_loss=0.3622, over 3354325.32 frames. ], batch size: 40, lr: 6.72e-03, grad_scale: 32.0 2024-09-23 18:01:30,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=314682.6666666667, ans=0.0 2024-09-23 18:02:09,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=314776.0, ans=0.125 2024-09-23 18:02:21,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=314822.6666666667, ans=0.125 2024-09-23 18:02:37,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=314869.3333333333, ans=0.125 2024-09-23 18:02:38,965 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.291e+02 1.389e+02 1.552e+02 2.070e+02, threshold=2.777e+02, percent-clipped=0.0 2024-09-23 18:02:48,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=314916.0, ans=0.125 2024-09-23 18:02:49,985 INFO [train.py:1198] (3/4) Epoch 18, batch 1250, loss[loss=0.1995, ctc_loss=0.1349, cr_loss=0.3227, over 17124.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1468, cr_loss=0.3615, over 3362694.90 frames. ], batch size: 40, lr: 6.72e-03, grad_scale: 32.0 2024-09-23 18:03:40,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=315056.0, ans=0.0 2024-09-23 18:03:46,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=315056.0, ans=0.1 2024-09-23 18:03:52,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=315102.6666666667, ans=0.125 2024-09-23 18:03:59,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=315102.6666666667, ans=0.0 2024-09-23 18:04:02,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=315102.6666666667, ans=0.2 2024-09-23 18:04:04,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=315102.6666666667, ans=0.0 2024-09-23 18:04:10,280 INFO [train.py:1198] (3/4) Epoch 18, batch 1300, loss[loss=0.2248, ctc_loss=0.1455, cr_loss=0.3962, over 17293.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1468, cr_loss=0.3617, over 3366137.84 frames. ], batch size: 51, lr: 6.71e-03, grad_scale: 32.0 2024-09-23 18:04:36,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=315196.0, ans=0.1 2024-09-23 18:04:52,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=315242.6666666667, ans=0.1 2024-09-23 18:05:04,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-09-23 18:05:09,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=315289.3333333333, ans=0.07 2024-09-23 18:05:14,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=315289.3333333333, ans=0.125 2024-09-23 18:05:21,639 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.312e+02 1.416e+02 1.661e+02 2.501e+02, threshold=2.832e+02, percent-clipped=0.0 2024-09-23 18:05:25,091 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 18:05:32,772 INFO [train.py:1198] (3/4) Epoch 18, batch 1350, loss[loss=0.2408, ctc_loss=0.1617, cr_loss=0.3954, over 14842.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1485, cr_loss=0.3643, over 3356722.46 frames. ], batch size: 89, lr: 6.71e-03, grad_scale: 32.0 2024-09-23 18:05:58,239 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=22.5 2024-09-23 18:06:00,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=315429.3333333333, ans=0.0 2024-09-23 18:06:04,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=315429.3333333333, ans=0.125 2024-09-23 18:06:09,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=315476.0, ans=15.0 2024-09-23 18:06:20,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2024-09-23 18:06:21,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=315522.6666666667, ans=0.0 2024-09-23 18:06:39,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=15.0 2024-09-23 18:06:51,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=315569.3333333333, ans=0.0 2024-09-23 18:06:54,863 INFO [train.py:1198] (3/4) Epoch 18, batch 1400, loss[loss=0.2533, ctc_loss=0.1681, cr_loss=0.4258, over 17290.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1487, cr_loss=0.3649, over 3364185.71 frames. ], batch size: 49, lr: 6.71e-03, grad_scale: 32.0 2024-09-23 18:07:01,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2024-09-23 18:07:07,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=315616.0, ans=0.125 2024-09-23 18:07:36,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=15.0 2024-09-23 18:07:51,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=15.0 2024-09-23 18:08:08,564 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.236e+02 1.313e+02 1.450e+02 2.376e+02, threshold=2.626e+02, percent-clipped=0.0 2024-09-23 18:08:18,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=315849.3333333333, ans=0.125 2024-09-23 18:08:19,850 INFO [train.py:1198] (3/4) Epoch 18, batch 1450, loss[loss=0.2485, ctc_loss=0.17, cr_loss=0.3924, over 16915.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1488, cr_loss=0.3645, over 3365183.56 frames. ], batch size: 58, lr: 6.71e-03, grad_scale: 32.0 2024-09-23 18:08:59,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=315942.6666666667, ans=0.2 2024-09-23 18:09:04,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=315942.6666666667, ans=0.04949747468305833 2024-09-23 18:09:07,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=315989.3333333333, ans=0.2 2024-09-23 18:09:26,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=316036.0, ans=0.1 2024-09-23 18:09:31,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=316036.0, ans=0.2 2024-09-23 18:09:38,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.84 vs. limit=12.0 2024-09-23 18:09:42,229 INFO [train.py:1198] (3/4) Epoch 18, batch 1500, loss[loss=0.2485, ctc_loss=0.167, cr_loss=0.4074, over 17220.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1483, cr_loss=0.3648, over 3371736.43 frames. ], batch size: 55, lr: 6.70e-03, grad_scale: 32.0 2024-09-23 18:09:45,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=316082.6666666667, ans=0.0 2024-09-23 18:09:55,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=316082.6666666667, ans=0.0 2024-09-23 18:10:05,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2024-09-23 18:10:53,732 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.296e+02 1.405e+02 1.565e+02 2.957e+02, threshold=2.810e+02, percent-clipped=1.0 2024-09-23 18:10:59,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=316269.3333333333, ans=0.0 2024-09-23 18:11:05,020 INFO [train.py:1198] (3/4) Epoch 18, batch 1550, loss[loss=0.248, ctc_loss=0.1695, cr_loss=0.3925, over 17215.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1476, cr_loss=0.3632, over 3365330.22 frames. ], batch size: 55, lr: 6.70e-03, grad_scale: 32.0 2024-09-23 18:11:10,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=316316.0, ans=0.125 2024-09-23 18:11:13,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=316316.0, ans=0.0 2024-09-23 18:12:00,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2024-09-23 18:12:05,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=316456.0, ans=0.2 2024-09-23 18:12:15,772 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=22.5 2024-09-23 18:12:28,052 INFO [train.py:1198] (3/4) Epoch 18, batch 1600, loss[loss=0.2505, ctc_loss=0.1716, cr_loss=0.3943, over 16879.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1473, cr_loss=0.3629, over 3371234.65 frames. ], batch size: 58, lr: 6.70e-03, grad_scale: 32.0 2024-09-23 18:12:50,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2024-09-23 18:12:52,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=15.0 2024-09-23 18:13:07,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=316642.6666666667, ans=0.125 2024-09-23 18:13:14,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=316642.6666666667, ans=0.0 2024-09-23 18:13:24,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2024-09-23 18:13:36,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=316736.0, ans=0.125 2024-09-23 18:13:38,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=316736.0, ans=0.1 2024-09-23 18:13:39,405 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.280e+02 1.386e+02 1.539e+02 3.392e+02, threshold=2.772e+02, percent-clipped=1.0 2024-09-23 18:13:50,544 INFO [train.py:1198] (3/4) Epoch 18, batch 1650, loss[loss=0.1905, ctc_loss=0.1286, cr_loss=0.3092, over 17093.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1478, cr_loss=0.3635, over 3369714.98 frames. ], batch size: 43, lr: 6.70e-03, grad_scale: 32.0 2024-09-23 18:14:15,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=316829.3333333333, ans=0.0 2024-09-23 18:14:20,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=316829.3333333333, ans=0.0 2024-09-23 18:14:42,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=316922.6666666667, ans=0.125 2024-09-23 18:14:47,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=316922.6666666667, ans=0.125 2024-09-23 18:15:08,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=316969.3333333333, ans=0.125 2024-09-23 18:15:13,364 INFO [train.py:1198] (3/4) Epoch 18, batch 1700, loss[loss=0.2329, ctc_loss=0.1585, cr_loss=0.372, over 17312.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1479, cr_loss=0.3633, over 3368022.86 frames. ], batch size: 49, lr: 6.69e-03, grad_scale: 32.0 2024-09-23 18:15:15,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=317016.0, ans=0.0 2024-09-23 18:15:48,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.04 vs. limit=12.0 2024-09-23 18:15:55,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=317109.3333333333, ans=0.125 2024-09-23 18:16:13,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=317156.0, ans=0.025 2024-09-23 18:16:15,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=317156.0, ans=0.05 2024-09-23 18:16:23,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=317202.6666666667, ans=0.0 2024-09-23 18:16:24,388 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.272e+02 1.364e+02 1.549e+02 1.950e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-23 18:16:27,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=317202.6666666667, ans=0.125 2024-09-23 18:16:35,525 INFO [train.py:1198] (3/4) Epoch 18, batch 1750, loss[loss=0.2065, ctc_loss=0.1377, cr_loss=0.344, over 17310.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1479, cr_loss=0.3636, over 3366979.18 frames. ], batch size: 46, lr: 6.69e-03, grad_scale: 32.0 2024-09-23 18:17:44,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.66 vs. limit=15.0 2024-09-23 18:18:02,971 INFO [train.py:1198] (3/4) Epoch 18, batch 1800, loss[loss=0.201, ctc_loss=0.1315, cr_loss=0.3476, over 17267.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.148, cr_loss=0.3645, over 3370555.80 frames. ], batch size: 42, lr: 6.69e-03, grad_scale: 32.0 2024-09-23 18:18:06,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=317482.6666666667, ans=0.025 2024-09-23 18:18:40,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=317576.0, ans=0.025 2024-09-23 18:18:40,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=317576.0, ans=0.0 2024-09-23 18:18:46,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=317576.0, ans=0.125 2024-09-23 18:18:51,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=317622.6666666667, ans=0.09899494936611666 2024-09-23 18:19:14,899 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.243e+02 1.320e+02 1.455e+02 2.029e+02, threshold=2.640e+02, percent-clipped=0.0 2024-09-23 18:19:17,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.81 vs. limit=12.0 2024-09-23 18:19:23,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=317669.3333333333, ans=0.07 2024-09-23 18:19:26,150 INFO [train.py:1198] (3/4) Epoch 18, batch 1850, loss[loss=0.2034, ctc_loss=0.134, cr_loss=0.3471, over 16789.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1478, cr_loss=0.3635, over 3368043.69 frames. ], batch size: 37, lr: 6.69e-03, grad_scale: 32.0 2024-09-23 18:19:58,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=317809.3333333333, ans=0.2 2024-09-23 18:20:00,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.79 vs. limit=22.5 2024-09-23 18:20:05,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=317809.3333333333, ans=0.0 2024-09-23 18:20:13,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.62 vs. limit=15.0 2024-09-23 18:20:22,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=317856.0, ans=0.025 2024-09-23 18:20:49,119 INFO [train.py:1198] (3/4) Epoch 18, batch 1900, loss[loss=0.2556, ctc_loss=0.1714, cr_loss=0.4207, over 17231.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1485, cr_loss=0.3646, over 3367601.72 frames. ], batch size: 55, lr: 6.68e-03, grad_scale: 32.0 2024-09-23 18:21:06,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=15.0 2024-09-23 18:21:58,144 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.235e+02 1.336e+02 1.423e+02 1.924e+02, threshold=2.671e+02, percent-clipped=0.0 2024-09-23 18:22:07,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=318136.0, ans=0.125 2024-09-23 18:22:11,956 INFO [train.py:1198] (3/4) Epoch 18, batch 1950, loss[loss=0.2281, ctc_loss=0.1486, cr_loss=0.3972, over 17288.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1479, cr_loss=0.3642, over 3377507.56 frames. ], batch size: 46, lr: 6.68e-03, grad_scale: 32.0 2024-09-23 18:22:39,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=318229.3333333333, ans=0.125 2024-09-23 18:22:48,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=318276.0, ans=0.0 2024-09-23 18:22:57,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2024-09-23 18:22:59,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=318276.0, ans=0.125 2024-09-23 18:23:18,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=318369.3333333333, ans=0.025 2024-09-23 18:23:28,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=318369.3333333333, ans=0.125 2024-09-23 18:23:30,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=318369.3333333333, ans=0.2 2024-09-23 18:23:34,524 INFO [train.py:1198] (3/4) Epoch 18, batch 2000, loss[loss=0.2046, ctc_loss=0.1382, cr_loss=0.3322, over 17090.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1473, cr_loss=0.3628, over 3373752.17 frames. ], batch size: 49, lr: 6.68e-03, grad_scale: 32.0 2024-09-23 18:23:55,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=318462.6666666667, ans=0.2 2024-09-23 18:24:19,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=318509.3333333333, ans=0.0 2024-09-23 18:24:34,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=12.0 2024-09-23 18:24:46,013 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.249e+02 1.342e+02 1.462e+02 2.296e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-23 18:24:46,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=318602.6666666667, ans=0.0 2024-09-23 18:24:46,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=318602.6666666667, ans=0.125 2024-09-23 18:24:57,330 INFO [train.py:1198] (3/4) Epoch 18, batch 2050, loss[loss=0.208, ctc_loss=0.1372, cr_loss=0.3542, over 17044.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1475, cr_loss=0.363, over 3373104.76 frames. ], batch size: 56, lr: 6.68e-03, grad_scale: 32.0 2024-09-23 18:24:59,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=318649.3333333333, ans=0.125 2024-09-23 18:25:34,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=318742.6666666667, ans=0.95 2024-09-23 18:25:47,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=318789.3333333333, ans=0.125 2024-09-23 18:25:54,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.54 vs. limit=10.0 2024-09-23 18:25:56,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=318789.3333333333, ans=0.1 2024-09-23 18:25:56,903 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2024-09-23 18:26:19,583 INFO [train.py:1198] (3/4) Epoch 18, batch 2100, loss[loss=0.218, ctc_loss=0.146, cr_loss=0.36, over 17341.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1475, cr_loss=0.3627, over 3367178.99 frames. ], batch size: 48, lr: 6.67e-03, grad_scale: 32.0 2024-09-23 18:26:53,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=318976.0, ans=0.1 2024-09-23 18:27:12,786 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-09-23 18:27:14,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2024-09-23 18:27:31,219 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.263e+02 1.339e+02 1.437e+02 2.082e+02, threshold=2.677e+02, percent-clipped=0.0 2024-09-23 18:27:37,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=319069.3333333333, ans=0.125 2024-09-23 18:27:44,924 INFO [train.py:1198] (3/4) Epoch 18, batch 2150, loss[loss=0.1985, ctc_loss=0.1338, cr_loss=0.3238, over 17056.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1481, cr_loss=0.3641, over 3363072.85 frames. ], batch size: 39, lr: 6.67e-03, grad_scale: 32.0 2024-09-23 18:28:32,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.40 vs. limit=22.5 2024-09-23 18:29:05,238 INFO [train.py:1198] (3/4) Epoch 18, batch 2200, loss[loss=0.2183, ctc_loss=0.145, cr_loss=0.3665, over 17215.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1477, cr_loss=0.3634, over 3363645.00 frames. ], batch size: 47, lr: 6.67e-03, grad_scale: 16.0 2024-09-23 18:29:39,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=319442.6666666667, ans=0.125 2024-09-23 18:29:54,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=319489.3333333333, ans=0.0 2024-09-23 18:30:17,667 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.053e+02 1.244e+02 1.317e+02 1.469e+02 2.029e+02, threshold=2.633e+02, percent-clipped=0.0 2024-09-23 18:30:22,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.06 vs. limit=22.5 2024-09-23 18:30:27,418 INFO [train.py:1198] (3/4) Epoch 18, batch 2250, loss[loss=0.2253, ctc_loss=0.1495, cr_loss=0.379, over 17301.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1472, cr_loss=0.3628, over 3365784.52 frames. ], batch size: 49, lr: 6.67e-03, grad_scale: 16.0 2024-09-23 18:30:31,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.07 vs. limit=12.0 2024-09-23 18:31:07,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=319676.0, ans=0.5 2024-09-23 18:31:10,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=319676.0, ans=0.2 2024-09-23 18:31:38,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2024-09-23 18:31:42,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=319769.3333333333, ans=0.04949747468305833 2024-09-23 18:31:50,361 INFO [train.py:1198] (3/4) Epoch 18, batch 2300, loss[loss=0.2263, ctc_loss=0.1509, cr_loss=0.3768, over 17085.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1472, cr_loss=0.3627, over 3370652.12 frames. ], batch size: 46, lr: 6.67e-03, grad_scale: 16.0 2024-09-23 18:32:23,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=319909.3333333333, ans=0.125 2024-09-23 18:32:25,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=319909.3333333333, ans=0.025 2024-09-23 18:32:51,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=319956.0, ans=0.0 2024-09-23 18:33:01,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=320002.6666666667, ans=0.125 2024-09-23 18:33:05,708 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.272e+02 1.367e+02 1.553e+02 2.225e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-23 18:33:15,258 INFO [train.py:1198] (3/4) Epoch 18, batch 2350, loss[loss=0.1876, ctc_loss=0.1221, cr_loss=0.3277, over 17323.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1476, cr_loss=0.3632, over 3359992.71 frames. ], batch size: 42, lr: 6.66e-03, grad_scale: 16.0 2024-09-23 18:33:23,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=320049.3333333333, ans=0.0 2024-09-23 18:33:28,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=320049.3333333333, ans=0.125 2024-09-23 18:33:28,796 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2024-09-23 18:33:36,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=320096.0, ans=0.025 2024-09-23 18:33:54,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=320142.6666666667, ans=0.1 2024-09-23 18:34:02,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=320189.3333333333, ans=0.125 2024-09-23 18:34:26,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=320236.0, ans=0.0 2024-09-23 18:34:37,844 INFO [train.py:1198] (3/4) Epoch 18, batch 2400, loss[loss=0.1815, ctc_loss=0.1174, cr_loss=0.3204, over 16954.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1467, cr_loss=0.3613, over 3356009.05 frames. ], batch size: 42, lr: 6.66e-03, grad_scale: 32.0 2024-09-23 18:35:50,941 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.273e+02 1.349e+02 1.474e+02 2.344e+02, threshold=2.697e+02, percent-clipped=0.0 2024-09-23 18:36:00,513 INFO [train.py:1198] (3/4) Epoch 18, batch 2450, loss[loss=0.2198, ctc_loss=0.1469, cr_loss=0.3644, over 17151.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1475, cr_loss=0.3627, over 3358314.15 frames. ], batch size: 48, lr: 6.66e-03, grad_scale: 32.0 2024-09-23 18:36:47,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=320656.0, ans=0.0 2024-09-23 18:37:05,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=320702.6666666667, ans=0.05 2024-09-23 18:37:22,754 INFO [train.py:1198] (3/4) Epoch 18, batch 2500, loss[loss=0.2825, ctc_loss=0.2013, cr_loss=0.406, over 11430.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1471, cr_loss=0.3626, over 3360863.96 frames. ], batch size: 123, lr: 6.66e-03, grad_scale: 32.0 2024-09-23 18:37:32,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=320749.3333333333, ans=0.0 2024-09-23 18:37:45,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=320796.0, ans=0.125 2024-09-23 18:37:54,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320796.0, ans=0.1 2024-09-23 18:38:19,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=15.0 2024-09-23 18:38:20,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.47 vs. limit=15.0 2024-09-23 18:38:32,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.14 vs. limit=15.0 2024-09-23 18:38:37,301 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.301e+02 1.381e+02 1.501e+02 2.403e+02, threshold=2.762e+02, percent-clipped=0.0 2024-09-23 18:38:42,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=320936.0, ans=0.0 2024-09-23 18:38:45,214 INFO [train.py:1198] (3/4) Epoch 18, batch 2550, loss[loss=0.2624, ctc_loss=0.1783, cr_loss=0.4202, over 15116.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1479, cr_loss=0.3639, over 3344784.67 frames. ], batch size: 89, lr: 6.65e-03, grad_scale: 16.0 2024-09-23 18:39:15,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=15.0 2024-09-23 18:39:44,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=321122.6666666667, ans=0.125 2024-09-23 18:39:54,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.84 vs. limit=15.0 2024-09-23 18:40:07,840 INFO [train.py:1198] (3/4) Epoch 18, batch 2600, loss[loss=0.2318, ctc_loss=0.1562, cr_loss=0.3777, over 17010.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1479, cr_loss=0.3637, over 3350909.01 frames. ], batch size: 51, lr: 6.65e-03, grad_scale: 16.0 2024-09-23 18:40:16,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=321216.0, ans=0.0 2024-09-23 18:40:16,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.32 vs. limit=15.0 2024-09-23 18:40:42,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=321309.3333333333, ans=0.0 2024-09-23 18:41:08,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.20 vs. limit=15.0 2024-09-23 18:41:22,320 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.312e+02 1.421e+02 1.594e+02 2.386e+02, threshold=2.841e+02, percent-clipped=0.0 2024-09-23 18:41:22,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=321402.6666666667, ans=0.125 2024-09-23 18:41:30,435 INFO [train.py:1198] (3/4) Epoch 18, batch 2650, loss[loss=0.2171, ctc_loss=0.1444, cr_loss=0.3637, over 17164.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1479, cr_loss=0.3635, over 3340415.02 frames. ], batch size: 41, lr: 6.65e-03, grad_scale: 16.0 2024-09-23 18:41:57,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.90 vs. limit=15.0 2024-09-23 18:42:19,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=321589.3333333333, ans=15.0 2024-09-23 18:42:22,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=321589.3333333333, ans=0.125 2024-09-23 18:42:55,426 INFO [train.py:1198] (3/4) Epoch 18, batch 2700, loss[loss=0.2095, ctc_loss=0.1411, cr_loss=0.3421, over 16920.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.148, cr_loss=0.3633, over 3340407.48 frames. ], batch size: 58, lr: 6.65e-03, grad_scale: 16.0 2024-09-23 18:43:26,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=321776.0, ans=0.125 2024-09-23 18:43:48,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=321822.6666666667, ans=0.0 2024-09-23 18:44:09,952 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.267e+02 1.364e+02 1.525e+02 2.538e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-23 18:44:14,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=321869.3333333333, ans=0.0 2024-09-23 18:44:17,775 INFO [train.py:1198] (3/4) Epoch 18, batch 2750, loss[loss=0.217, ctc_loss=0.1441, cr_loss=0.3644, over 17197.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1499, cr_loss=0.3665, over 3334604.02 frames. ], batch size: 47, lr: 6.64e-03, grad_scale: 16.0 2024-09-23 18:44:19,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=321916.0, ans=0.125 2024-09-23 18:44:26,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=321916.0, ans=0.0 2024-09-23 18:44:31,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=321916.0, ans=0.0 2024-09-23 18:44:56,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=322009.3333333333, ans=0.125 2024-09-23 18:45:06,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=322056.0, ans=0.2 2024-09-23 18:45:10,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=322056.0, ans=0.0 2024-09-23 18:45:40,667 INFO [train.py:1198] (3/4) Epoch 18, batch 2800, loss[loss=0.1822, ctc_loss=0.1208, cr_loss=0.307, over 16691.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1499, cr_loss=0.3665, over 3336160.59 frames. ], batch size: 37, lr: 6.64e-03, grad_scale: 32.0 2024-09-23 18:45:55,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=322196.0, ans=0.125 2024-09-23 18:46:49,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=322336.0, ans=0.125 2024-09-23 18:46:54,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.89 vs. limit=22.5 2024-09-23 18:46:55,163 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.235e+02 1.345e+02 1.484e+02 2.490e+02, threshold=2.689e+02, percent-clipped=0.0 2024-09-23 18:47:03,240 INFO [train.py:1198] (3/4) Epoch 18, batch 2850, loss[loss=0.1787, ctc_loss=0.1181, cr_loss=0.3034, over 17293.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1492, cr_loss=0.3655, over 3343569.11 frames. ], batch size: 49, lr: 6.64e-03, grad_scale: 32.0 2024-09-23 18:47:23,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=322429.3333333333, ans=0.0 2024-09-23 18:48:02,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2024-09-23 18:48:03,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=322522.6666666667, ans=0.125 2024-09-23 18:48:07,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=322522.6666666667, ans=0.125 2024-09-23 18:48:23,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=322569.3333333333, ans=0.05 2024-09-23 18:48:26,238 INFO [train.py:1198] (3/4) Epoch 18, batch 2900, loss[loss=0.1798, ctc_loss=0.1182, cr_loss=0.3078, over 17077.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1496, cr_loss=0.3656, over 3332967.79 frames. ], batch size: 43, lr: 6.64e-03, grad_scale: 32.0 2024-09-23 18:48:37,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=322616.0, ans=0.125 2024-09-23 18:48:49,348 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.91 vs. limit=10.0 2024-09-23 18:48:59,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=322709.3333333333, ans=0.1 2024-09-23 18:49:40,537 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.288e+02 1.376e+02 1.532e+02 2.384e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-23 18:49:42,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322802.6666666667, ans=0.1 2024-09-23 18:49:48,608 INFO [train.py:1198] (3/4) Epoch 18, batch 2950, loss[loss=0.1962, ctc_loss=0.1341, cr_loss=0.3107, over 17152.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1483, cr_loss=0.3637, over 3345322.12 frames. ], batch size: 45, lr: 6.63e-03, grad_scale: 32.0 2024-09-23 18:49:53,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=322849.3333333333, ans=0.125 2024-09-23 18:50:01,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2024-09-23 18:50:16,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=322896.0, ans=0.0 2024-09-23 18:50:38,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=322989.3333333333, ans=0.125 2024-09-23 18:50:41,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=322989.3333333333, ans=0.125 2024-09-23 18:50:43,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=322989.3333333333, ans=0.0 2024-09-23 18:50:48,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322989.3333333333, ans=0.1 2024-09-23 18:50:54,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=323036.0, ans=0.125 2024-09-23 18:51:02,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=323036.0, ans=0.1 2024-09-23 18:51:11,409 INFO [train.py:1198] (3/4) Epoch 18, batch 3000, loss[loss=0.2614, ctc_loss=0.1811, cr_loss=0.4018, over 16960.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1476, cr_loss=0.3623, over 3344271.56 frames. ], batch size: 53, lr: 6.63e-03, grad_scale: 32.0 2024-09-23 18:51:11,409 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 18:51:20,767 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.1237, 2.3065, 2.7076, 2.5116, 2.8754, 2.6828, 2.8796, 2.1027], device='cuda:3') 2024-09-23 18:51:26,831 INFO [train.py:1230] (3/4) Epoch 18, validation: loss=0.04062, ctc_loss=0.04062, cr_loss=7.511e-15, over 944034.00 frames. 2024-09-23 18:51:26,832 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 18:51:44,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2024-09-23 18:51:46,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=323129.3333333333, ans=0.1 2024-09-23 18:51:48,684 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 18:51:59,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=323176.0, ans=0.1 2024-09-23 18:52:39,991 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.237e+02 1.313e+02 1.391e+02 1.785e+02, threshold=2.627e+02, percent-clipped=0.0 2024-09-23 18:52:45,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=323269.3333333333, ans=0.0 2024-09-23 18:52:47,794 INFO [train.py:1198] (3/4) Epoch 18, batch 3050, loss[loss=0.198, ctc_loss=0.134, cr_loss=0.3199, over 16951.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1469, cr_loss=0.3614, over 3333144.35 frames. ], batch size: 42, lr: 6.63e-03, grad_scale: 32.0 2024-09-23 18:53:11,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=323362.6666666667, ans=0.2 2024-09-23 18:53:20,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-09-23 18:54:08,395 INFO [train.py:1198] (3/4) Epoch 18, batch 3100, loss[loss=0.2238, ctc_loss=0.1467, cr_loss=0.3856, over 16888.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1472, cr_loss=0.3623, over 3339204.13 frames. ], batch size: 58, lr: 6.63e-03, grad_scale: 32.0 2024-09-23 18:54:08,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=323549.3333333333, ans=0.5 2024-09-23 18:54:18,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=323549.3333333333, ans=0.0 2024-09-23 18:54:34,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2024-09-23 18:55:03,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=323689.3333333333, ans=0.125 2024-09-23 18:55:19,122 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.021e+02 1.256e+02 1.342e+02 1.439e+02 3.475e+02, threshold=2.684e+02, percent-clipped=1.0 2024-09-23 18:55:20,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=323736.0, ans=0.0 2024-09-23 18:55:26,827 INFO [train.py:1198] (3/4) Epoch 18, batch 3150, loss[loss=0.2434, ctc_loss=0.166, cr_loss=0.3873, over 14901.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1477, cr_loss=0.3632, over 3335942.18 frames. ], batch size: 88, lr: 6.62e-03, grad_scale: 32.0 2024-09-23 18:56:33,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=323969.3333333333, ans=0.1 2024-09-23 18:56:45,844 INFO [train.py:1198] (3/4) Epoch 18, batch 3200, loss[loss=0.1974, ctc_loss=0.1312, cr_loss=0.3311, over 17097.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1475, cr_loss=0.3628, over 3341740.11 frames. ], batch size: 40, lr: 6.62e-03, grad_scale: 32.0 2024-09-23 18:57:03,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=324062.6666666667, ans=0.09899494936611666 2024-09-23 18:57:03,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.39 vs. limit=15.0 2024-09-23 18:57:19,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=324109.3333333333, ans=0.0 2024-09-23 18:57:36,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=324156.0, ans=0.0 2024-09-23 18:57:45,420 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.15 vs. limit=22.5 2024-09-23 18:57:58,436 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.249e+02 1.354e+02 1.459e+02 2.306e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-23 18:58:03,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=324202.6666666667, ans=0.125 2024-09-23 18:58:06,268 INFO [train.py:1198] (3/4) Epoch 18, batch 3250, loss[loss=0.2525, ctc_loss=0.1712, cr_loss=0.4067, over 16054.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1478, cr_loss=0.3628, over 3346473.76 frames. ], batch size: 74, lr: 6.62e-03, grad_scale: 32.0 2024-09-23 18:58:14,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2024-09-23 18:58:27,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=324296.0, ans=0.2 2024-09-23 18:58:37,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=324342.6666666667, ans=0.125 2024-09-23 18:58:41,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=324342.6666666667, ans=0.0 2024-09-23 18:59:04,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=324389.3333333333, ans=0.0 2024-09-23 18:59:07,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=324436.0, ans=0.125 2024-09-23 18:59:24,402 INFO [train.py:1198] (3/4) Epoch 18, batch 3300, loss[loss=0.1984, ctc_loss=0.1322, cr_loss=0.3312, over 17318.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1488, cr_loss=0.3634, over 3327890.57 frames. ], batch size: 49, lr: 6.62e-03, grad_scale: 32.0 2024-09-23 18:59:52,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=324529.3333333333, ans=0.0 2024-09-23 18:59:52,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=324529.3333333333, ans=0.125 2024-09-23 18:59:58,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=12.0 2024-09-23 19:00:10,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.80 vs. limit=10.0 2024-09-23 19:00:36,180 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.318e+02 1.406e+02 1.543e+02 2.189e+02, threshold=2.811e+02, percent-clipped=0.0 2024-09-23 19:00:44,002 INFO [train.py:1198] (3/4) Epoch 18, batch 3350, loss[loss=0.251, ctc_loss=0.1735, cr_loss=0.3874, over 17030.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.149, cr_loss=0.3644, over 3331641.79 frames. ], batch size: 53, lr: 6.62e-03, grad_scale: 32.0 2024-09-23 19:00:52,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=324716.0, ans=0.1 2024-09-23 19:01:14,808 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.08 vs. limit=15.0 2024-09-23 19:01:31,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=324856.0, ans=0.1 2024-09-23 19:01:34,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=324856.0, ans=0.125 2024-09-23 19:01:58,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-09-23 19:02:04,710 INFO [train.py:1198] (3/4) Epoch 18, batch 3400, loss[loss=0.2053, ctc_loss=0.1349, cr_loss=0.3523, over 17025.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1474, cr_loss=0.3622, over 3336671.61 frames. ], batch size: 44, lr: 6.61e-03, grad_scale: 32.0 2024-09-23 19:02:40,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=325042.6666666667, ans=0.125 2024-09-23 19:02:48,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-09-23 19:02:53,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=325089.3333333333, ans=0.0 2024-09-23 19:03:01,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=325089.3333333333, ans=0.0 2024-09-23 19:03:14,634 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.235e+02 1.326e+02 1.505e+02 2.372e+02, threshold=2.653e+02, percent-clipped=0.0 2024-09-23 19:03:16,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=325136.0, ans=22.5 2024-09-23 19:03:22,390 INFO [train.py:1198] (3/4) Epoch 18, batch 3450, loss[loss=0.2782, ctc_loss=0.1905, cr_loss=0.4385, over 17213.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1471, cr_loss=0.3618, over 3345143.90 frames. ], batch size: 55, lr: 6.61e-03, grad_scale: 32.0 2024-09-23 19:03:32,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=325182.6666666667, ans=0.025 2024-09-23 19:03:38,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=325229.3333333333, ans=0.0 2024-09-23 19:03:47,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=325229.3333333333, ans=0.0 2024-09-23 19:03:51,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=325229.3333333333, ans=0.1 2024-09-23 19:04:05,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=325276.0, ans=0.0 2024-09-23 19:04:42,581 INFO [train.py:1198] (3/4) Epoch 18, batch 3500, loss[loss=0.2615, ctc_loss=0.1789, cr_loss=0.4128, over 17015.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1474, cr_loss=0.3626, over 3346986.87 frames. ], batch size: 52, lr: 6.61e-03, grad_scale: 32.0 2024-09-23 19:05:23,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=325509.3333333333, ans=0.0 2024-09-23 19:05:30,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=325556.0, ans=0.125 2024-09-23 19:05:41,052 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 19:05:45,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=325602.6666666667, ans=0.0 2024-09-23 19:05:53,068 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.293e+02 1.389e+02 1.509e+02 2.092e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-23 19:05:53,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=325602.6666666667, ans=0.2 2024-09-23 19:06:00,906 INFO [train.py:1198] (3/4) Epoch 18, batch 3550, loss[loss=0.2136, ctc_loss=0.1452, cr_loss=0.3417, over 17004.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.147, cr_loss=0.3622, over 3358151.28 frames. ], batch size: 51, lr: 6.61e-03, grad_scale: 32.0 2024-09-23 19:06:05,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2024-09-23 19:06:07,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=325649.3333333333, ans=0.125 2024-09-23 19:06:24,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=325696.0, ans=0.5 2024-09-23 19:06:37,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.82 vs. limit=15.0 2024-09-23 19:07:17,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=325836.0, ans=0.1 2024-09-23 19:07:21,665 INFO [train.py:1198] (3/4) Epoch 18, batch 3600, loss[loss=0.2115, ctc_loss=0.1407, cr_loss=0.3541, over 17226.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1469, cr_loss=0.3625, over 3355036.99 frames. ], batch size: 50, lr: 6.60e-03, grad_scale: 32.0 2024-09-23 19:07:21,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=325882.6666666667, ans=0.125 2024-09-23 19:07:23,531 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 19:07:26,608 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 19:07:33,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=22.5 2024-09-23 19:07:38,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=15.0 2024-09-23 19:08:09,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.75 vs. limit=10.0 2024-09-23 19:08:13,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.50 vs. limit=22.5 2024-09-23 19:08:28,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=326069.3333333333, ans=0.125 2024-09-23 19:08:31,461 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.271e+02 1.404e+02 1.560e+02 2.337e+02, threshold=2.809e+02, percent-clipped=0.0 2024-09-23 19:08:39,267 INFO [train.py:1198] (3/4) Epoch 18, batch 3650, loss[loss=0.2165, ctc_loss=0.1415, cr_loss=0.3752, over 17253.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1467, cr_loss=0.3627, over 3364530.75 frames. ], batch size: 44, lr: 6.60e-03, grad_scale: 32.0 2024-09-23 19:08:52,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=326116.0, ans=0.125 2024-09-23 19:09:22,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=326209.3333333333, ans=0.0 2024-09-23 19:09:24,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.99 vs. limit=15.0 2024-09-23 19:09:26,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=326256.0, ans=0.2 2024-09-23 19:09:37,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=326256.0, ans=0.125 2024-09-23 19:09:49,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.17 vs. limit=10.0 2024-09-23 19:10:00,634 INFO [train.py:1198] (3/4) Epoch 18, batch 3700, loss[loss=0.2449, ctc_loss=0.1681, cr_loss=0.3842, over 17024.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1466, cr_loss=0.3622, over 3372387.87 frames. ], batch size: 52, lr: 6.60e-03, grad_scale: 32.0 2024-09-23 19:10:03,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=326349.3333333333, ans=0.125 2024-09-23 19:10:10,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=326349.3333333333, ans=0.125 2024-09-23 19:10:17,970 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 19:10:37,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=22.5 2024-09-23 19:11:03,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=326536.0, ans=0.1 2024-09-23 19:11:10,734 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.270e+02 1.376e+02 1.485e+02 2.172e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-23 19:11:18,532 INFO [train.py:1198] (3/4) Epoch 18, batch 3750, loss[loss=0.2355, ctc_loss=0.1609, cr_loss=0.3728, over 16472.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1469, cr_loss=0.3617, over 3363554.42 frames. ], batch size: 66, lr: 6.60e-03, grad_scale: 32.0 2024-09-23 19:11:19,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=22.5 2024-09-23 19:12:22,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=326769.3333333333, ans=0.1 2024-09-23 19:12:28,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=326769.3333333333, ans=0.025 2024-09-23 19:12:37,970 INFO [train.py:1198] (3/4) Epoch 18, batch 3800, loss[loss=0.2602, ctc_loss=0.1778, cr_loss=0.4121, over 15045.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1473, cr_loss=0.3622, over 3344548.90 frames. ], batch size: 89, lr: 6.59e-03, grad_scale: 32.0 2024-09-23 19:12:45,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=326816.0, ans=0.0 2024-09-23 19:12:52,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=326862.6666666667, ans=0.0 2024-09-23 19:12:56,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.91 vs. limit=15.0 2024-09-23 19:13:00,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=326862.6666666667, ans=0.0 2024-09-23 19:13:11,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=326909.3333333333, ans=0.0 2024-09-23 19:13:16,215 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 19:13:49,015 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.286e+02 1.423e+02 1.561e+02 2.075e+02, threshold=2.847e+02, percent-clipped=0.0 2024-09-23 19:13:56,835 INFO [train.py:1198] (3/4) Epoch 18, batch 3850, loss[loss=0.1978, ctc_loss=0.1307, cr_loss=0.3359, over 16953.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1485, cr_loss=0.3618, over 3291702.28 frames. ], batch size: 42, lr: 6.59e-03, grad_scale: 32.0 2024-09-23 19:14:03,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=327049.3333333333, ans=0.125 2024-09-23 19:14:07,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=327049.3333333333, ans=0.025 2024-09-23 19:14:28,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=327142.6666666667, ans=0.125 2024-09-23 19:14:30,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=327142.6666666667, ans=0.0 2024-09-23 19:14:31,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.19 vs. limit=15.0 2024-09-23 19:14:34,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=327142.6666666667, ans=0.125 2024-09-23 19:15:04,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=327236.0, ans=0.125 2024-09-23 19:15:59,118 INFO [train.py:1198] (3/4) Epoch 19, batch 0, loss[loss=0.2079, ctc_loss=0.1378, cr_loss=0.3507, over 17309.00 frames. ], tot_loss[loss=0.2079, ctc_loss=0.1378, cr_loss=0.3507, over 17309.00 frames. ], batch size: 49, lr: 6.41e-03, grad_scale: 32.0 2024-09-23 19:15:59,119 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 19:16:14,368 INFO [train.py:1230] (3/4) Epoch 19, validation: loss=0.03972, ctc_loss=0.03972, cr_loss=8.025e-15, over 944034.00 frames. 2024-09-23 19:16:14,368 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 19:16:18,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2024-09-23 19:16:39,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=327310.6666666667, ans=0.125 2024-09-23 19:17:05,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=327404.0, ans=0.125 2024-09-23 19:17:36,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=327450.6666666667, ans=0.125 2024-09-23 19:17:39,623 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.290e+02 1.473e+02 1.789e+02 2.583e+02, threshold=2.945e+02, percent-clipped=0.0 2024-09-23 19:17:41,332 INFO [train.py:1198] (3/4) Epoch 19, batch 50, loss[loss=0.186, ctc_loss=0.1231, cr_loss=0.3144, over 17060.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1434, cr_loss=0.3578, over 765490.47 frames. ], batch size: 39, lr: 6.41e-03, grad_scale: 32.0 2024-09-23 19:17:51,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=327497.3333333333, ans=0.025 2024-09-23 19:17:51,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=327497.3333333333, ans=0.125 2024-09-23 19:18:12,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=327590.6666666667, ans=0.125 2024-09-23 19:18:20,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=327590.6666666667, ans=0.0 2024-09-23 19:18:39,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=327637.3333333333, ans=0.125 2024-09-23 19:18:54,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=327684.0, ans=0.125 2024-09-23 19:19:03,613 INFO [train.py:1198] (3/4) Epoch 19, batch 100, loss[loss=0.2685, ctc_loss=0.188, cr_loss=0.4024, over 11455.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1461, cr_loss=0.3633, over 1340867.87 frames. ], batch size: 124, lr: 6.41e-03, grad_scale: 32.0 2024-09-23 19:19:05,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=327730.6666666667, ans=0.125 2024-09-23 19:19:21,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=327777.3333333333, ans=0.1 2024-09-23 19:19:22,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=327777.3333333333, ans=0.125 2024-09-23 19:19:48,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=327824.0, ans=0.0 2024-09-23 19:20:21,266 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.272e+02 1.373e+02 1.513e+02 2.086e+02, threshold=2.746e+02, percent-clipped=0.0 2024-09-23 19:20:22,823 INFO [train.py:1198] (3/4) Epoch 19, batch 150, loss[loss=0.2282, ctc_loss=0.1546, cr_loss=0.3678, over 16086.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1455, cr_loss=0.3617, over 1793756.03 frames. ], batch size: 74, lr: 6.40e-03, grad_scale: 32.0 2024-09-23 19:20:23,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=327964.0, ans=0.125 2024-09-23 19:20:27,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=327964.0, ans=0.125 2024-09-23 19:20:29,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=327964.0, ans=0.035 2024-09-23 19:20:29,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=327964.0, ans=0.0 2024-09-23 19:20:32,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=327964.0, ans=0.09899494936611666 2024-09-23 19:20:34,277 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 19:20:42,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=328010.6666666667, ans=0.125 2024-09-23 19:20:46,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=328010.6666666667, ans=0.0 2024-09-23 19:20:51,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328010.6666666667, ans=0.1 2024-09-23 19:20:53,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=328057.3333333333, ans=0.0 2024-09-23 19:21:01,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=328057.3333333333, ans=0.125 2024-09-23 19:21:23,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=328104.0, ans=0.0 2024-09-23 19:21:30,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328150.6666666667, ans=0.1 2024-09-23 19:21:40,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=328150.6666666667, ans=0.2 2024-09-23 19:21:42,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=328150.6666666667, ans=0.125 2024-09-23 19:21:48,161 INFO [train.py:1198] (3/4) Epoch 19, batch 200, loss[loss=0.2238, ctc_loss=0.1463, cr_loss=0.3875, over 17027.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1449, cr_loss=0.3599, over 2141477.37 frames. ], batch size: 44, lr: 6.40e-03, grad_scale: 32.0 2024-09-23 19:22:08,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.93 vs. limit=10.0 2024-09-23 19:22:12,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=328244.0, ans=0.125 2024-09-23 19:22:18,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328290.6666666667, ans=0.1 2024-09-23 19:22:28,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.12 vs. limit=22.5 2024-09-23 19:23:03,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328384.0, ans=0.1 2024-09-23 19:23:09,156 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.256e+02 1.366e+02 1.552e+02 2.193e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-23 19:23:09,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=328430.6666666667, ans=0.125 2024-09-23 19:23:10,721 INFO [train.py:1198] (3/4) Epoch 19, batch 250, loss[loss=0.2427, ctc_loss=0.1631, cr_loss=0.3978, over 17353.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1457, cr_loss=0.3615, over 2414573.47 frames. ], batch size: 48, lr: 6.40e-03, grad_scale: 32.0 2024-09-23 19:23:20,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=328430.6666666667, ans=10.0 2024-09-23 19:23:22,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=328430.6666666667, ans=0.125 2024-09-23 19:24:12,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=328570.6666666667, ans=0.015 2024-09-23 19:24:12,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=328570.6666666667, ans=0.125 2024-09-23 19:24:14,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328570.6666666667, ans=0.1 2024-09-23 19:24:32,612 INFO [train.py:1198] (3/4) Epoch 19, batch 300, loss[loss=0.219, ctc_loss=0.1472, cr_loss=0.3591, over 17098.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1455, cr_loss=0.3612, over 2626288.28 frames. ], batch size: 49, lr: 6.40e-03, grad_scale: 32.0 2024-09-23 19:25:05,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2024-09-23 19:25:17,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=328757.3333333333, ans=0.0 2024-09-23 19:25:32,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=328804.0, ans=0.0 2024-09-23 19:25:41,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=328850.6666666667, ans=0.125 2024-09-23 19:25:51,167 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.306e+02 1.376e+02 1.560e+02 2.317e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-23 19:25:52,833 INFO [train.py:1198] (3/4) Epoch 19, batch 350, loss[loss=0.2335, ctc_loss=0.1587, cr_loss=0.374, over 16810.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1466, cr_loss=0.3623, over 2778767.65 frames. ], batch size: 61, lr: 6.40e-03, grad_scale: 32.0 2024-09-23 19:26:12,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=328944.0, ans=0.125 2024-09-23 19:26:47,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=329037.3333333333, ans=0.0 2024-09-23 19:27:03,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=329084.0, ans=0.1 2024-09-23 19:27:04,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=329084.0, ans=0.0 2024-09-23 19:27:04,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=329084.0, ans=0.125 2024-09-23 19:27:06,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=329084.0, ans=0.125 2024-09-23 19:27:15,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.08 vs. limit=22.5 2024-09-23 19:27:17,483 INFO [train.py:1198] (3/4) Epoch 19, batch 400, loss[loss=0.1831, ctc_loss=0.1192, cr_loss=0.3196, over 16954.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1469, cr_loss=0.3625, over 2905756.92 frames. ], batch size: 42, lr: 6.39e-03, grad_scale: 32.0 2024-09-23 19:27:17,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=329130.6666666667, ans=0.1 2024-09-23 19:27:29,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=329130.6666666667, ans=0.125 2024-09-23 19:28:02,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=329224.0, ans=0.0 2024-09-23 19:28:18,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=329270.6666666667, ans=0.125 2024-09-23 19:28:34,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=329317.3333333333, ans=0.2 2024-09-23 19:28:38,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=329317.3333333333, ans=0.125 2024-09-23 19:28:41,423 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.241e+02 1.311e+02 1.431e+02 1.671e+02, threshold=2.622e+02, percent-clipped=0.0 2024-09-23 19:28:43,032 INFO [train.py:1198] (3/4) Epoch 19, batch 450, loss[loss=0.2003, ctc_loss=0.1325, cr_loss=0.339, over 17012.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1465, cr_loss=0.3619, over 3011437.65 frames. ], batch size: 44, lr: 6.39e-03, grad_scale: 32.0 2024-09-23 19:28:43,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=329364.0, ans=0.125 2024-09-23 19:28:49,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=329364.0, ans=0.125 2024-09-23 19:28:54,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=329364.0, ans=0.0 2024-09-23 19:29:00,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=329410.6666666667, ans=0.125 2024-09-23 19:29:07,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=329410.6666666667, ans=0.0 2024-09-23 19:29:16,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=329457.3333333333, ans=0.025 2024-09-23 19:29:50,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=329550.6666666667, ans=0.0 2024-09-23 19:29:55,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=329550.6666666667, ans=0.0 2024-09-23 19:30:03,337 INFO [train.py:1198] (3/4) Epoch 19, batch 500, loss[loss=0.2626, ctc_loss=0.1842, cr_loss=0.392, over 11974.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1473, cr_loss=0.3633, over 3088304.76 frames. ], batch size: 123, lr: 6.39e-03, grad_scale: 32.0 2024-09-23 19:30:08,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=329597.3333333333, ans=0.1 2024-09-23 19:30:14,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=329597.3333333333, ans=0.125 2024-09-23 19:30:28,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2024-09-23 19:30:40,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=329690.6666666667, ans=0.0 2024-09-23 19:30:42,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2024-09-23 19:30:45,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=12.0 2024-09-23 19:30:48,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=329690.6666666667, ans=0.125 2024-09-23 19:31:21,603 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.237e+02 1.337e+02 1.500e+02 1.948e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-23 19:31:23,262 INFO [train.py:1198] (3/4) Epoch 19, batch 550, loss[loss=0.2059, ctc_loss=0.1342, cr_loss=0.3582, over 17137.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1465, cr_loss=0.3619, over 3143166.91 frames. ], batch size: 48, lr: 6.39e-03, grad_scale: 32.0 2024-09-23 19:31:36,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=329830.6666666667, ans=0.0 2024-09-23 19:31:42,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=12.0 2024-09-23 19:32:09,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=329924.0, ans=0.2 2024-09-23 19:32:35,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=330017.3333333333, ans=0.125 2024-09-23 19:32:35,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=330017.3333333333, ans=0.0 2024-09-23 19:32:45,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=330017.3333333333, ans=0.1 2024-09-23 19:32:51,190 INFO [train.py:1198] (3/4) Epoch 19, batch 600, loss[loss=0.1848, ctc_loss=0.1213, cr_loss=0.3175, over 17304.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1462, cr_loss=0.3613, over 3190619.64 frames. ], batch size: 51, lr: 6.38e-03, grad_scale: 32.0 2024-09-23 19:32:52,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=330064.0, ans=0.0 2024-09-23 19:33:07,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=330110.6666666667, ans=0.1 2024-09-23 19:33:13,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=330110.6666666667, ans=0.125 2024-09-23 19:33:29,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=330157.3333333333, ans=0.2 2024-09-23 19:33:33,243 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=12.0 2024-09-23 19:33:42,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=330204.0, ans=0.0 2024-09-23 19:33:47,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330204.0, ans=0.1 2024-09-23 19:34:12,590 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.274e+02 1.356e+02 1.568e+02 2.511e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-23 19:34:14,268 INFO [train.py:1198] (3/4) Epoch 19, batch 650, loss[loss=0.2305, ctc_loss=0.1585, cr_loss=0.36, over 17144.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.146, cr_loss=0.361, over 3235504.33 frames. ], batch size: 48, lr: 6.38e-03, grad_scale: 64.0 2024-09-23 19:35:04,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=330437.3333333333, ans=0.025 2024-09-23 19:35:19,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=22.5 2024-09-23 19:35:22,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=330484.0, ans=0.125 2024-09-23 19:35:23,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=330484.0, ans=0.0 2024-09-23 19:35:26,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330484.0, ans=0.1 2024-09-23 19:35:33,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=15.0 2024-09-23 19:35:34,531 INFO [train.py:1198] (3/4) Epoch 19, batch 700, loss[loss=0.1928, ctc_loss=0.1254, cr_loss=0.337, over 17089.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1464, cr_loss=0.3623, over 3267320.45 frames. ], batch size: 40, lr: 6.38e-03, grad_scale: 64.0 2024-09-23 19:36:03,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=330577.3333333333, ans=0.0 2024-09-23 19:36:23,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=330670.6666666667, ans=0.125 2024-09-23 19:36:58,032 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.250e+02 1.371e+02 1.487e+02 1.794e+02, threshold=2.742e+02, percent-clipped=0.0 2024-09-23 19:36:59,668 INFO [train.py:1198] (3/4) Epoch 19, batch 750, loss[loss=0.2037, ctc_loss=0.1318, cr_loss=0.3594, over 17205.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.147, cr_loss=0.3636, over 3296432.23 frames. ], batch size: 50, lr: 6.38e-03, grad_scale: 64.0 2024-09-23 19:37:05,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=330764.0, ans=22.5 2024-09-23 19:37:11,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=330764.0, ans=0.125 2024-09-23 19:37:24,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330810.6666666667, ans=0.1 2024-09-23 19:37:28,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=330810.6666666667, ans=22.5 2024-09-23 19:38:08,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=330950.6666666667, ans=0.0 2024-09-23 19:38:25,057 INFO [train.py:1198] (3/4) Epoch 19, batch 800, loss[loss=0.222, ctc_loss=0.1455, cr_loss=0.3824, over 16256.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1466, cr_loss=0.3625, over 3320549.34 frames. ], batch size: 75, lr: 6.38e-03, grad_scale: 32.0 2024-09-23 19:38:25,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=330997.3333333333, ans=0.0 2024-09-23 19:38:33,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=330997.3333333333, ans=0.125 2024-09-23 19:38:47,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331044.0, ans=0.1 2024-09-23 19:38:50,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=331044.0, ans=0.1 2024-09-23 19:39:11,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=331137.3333333333, ans=0.125 2024-09-23 19:39:39,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=12.0 2024-09-23 19:39:44,567 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.257e+02 1.307e+02 1.431e+02 3.261e+02, threshold=2.613e+02, percent-clipped=1.0 2024-09-23 19:39:44,592 INFO [train.py:1198] (3/4) Epoch 19, batch 850, loss[loss=0.2443, ctc_loss=0.168, cr_loss=0.381, over 16053.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1464, cr_loss=0.3626, over 3333249.32 frames. ], batch size: 74, lr: 6.37e-03, grad_scale: 32.0 2024-09-23 19:39:56,250 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=15.0 2024-09-23 19:40:18,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=331324.0, ans=0.125 2024-09-23 19:40:18,495 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 19:40:56,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331417.3333333333, ans=0.1 2024-09-23 19:41:04,321 INFO [train.py:1198] (3/4) Epoch 19, batch 900, loss[loss=0.1863, ctc_loss=0.1249, cr_loss=0.307, over 17096.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1457, cr_loss=0.3612, over 3340848.81 frames. ], batch size: 43, lr: 6.37e-03, grad_scale: 16.0 2024-09-23 19:41:04,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=331464.0, ans=0.1 2024-09-23 19:41:15,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=331464.0, ans=0.5 2024-09-23 19:41:30,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.67 vs. limit=22.5 2024-09-23 19:41:54,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=331557.3333333333, ans=0.125 2024-09-23 19:42:31,895 INFO [train.py:1198] (3/4) Epoch 19, batch 950, loss[loss=0.2953, ctc_loss=0.2128, cr_loss=0.4124, over 11741.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1449, cr_loss=0.3595, over 3344098.01 frames. ], batch size: 123, lr: 6.37e-03, grad_scale: 16.0 2024-09-23 19:42:33,540 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.264e+02 1.377e+02 1.501e+02 1.833e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-23 19:42:47,824 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.06 vs. limit=15.0 2024-09-23 19:43:16,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=331790.6666666667, ans=0.125 2024-09-23 19:43:38,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=331884.0, ans=0.125 2024-09-23 19:43:42,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=331884.0, ans=0.125 2024-09-23 19:43:44,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.09 vs. limit=10.0 2024-09-23 19:43:45,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=331884.0, ans=0.2 2024-09-23 19:43:54,921 INFO [train.py:1198] (3/4) Epoch 19, batch 1000, loss[loss=0.2483, ctc_loss=0.1685, cr_loss=0.3992, over 16497.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1456, cr_loss=0.361, over 3351182.99 frames. ], batch size: 66, lr: 6.37e-03, grad_scale: 16.0 2024-09-23 19:44:14,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=22.5 2024-09-23 19:44:16,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.84 vs. limit=10.0 2024-09-23 19:44:16,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.69 vs. limit=12.0 2024-09-23 19:44:23,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=331977.3333333333, ans=0.025 2024-09-23 19:44:31,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332024.0, ans=0.1 2024-09-23 19:45:14,337 INFO [train.py:1198] (3/4) Epoch 19, batch 1050, loss[loss=0.1814, ctc_loss=0.121, cr_loss=0.3018, over 17192.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1453, cr_loss=0.3609, over 3364098.14 frames. ], batch size: 41, lr: 6.36e-03, grad_scale: 16.0 2024-09-23 19:45:15,988 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.292e+02 1.398e+02 1.505e+02 1.868e+02, threshold=2.795e+02, percent-clipped=0.0 2024-09-23 19:45:22,109 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.62 vs. limit=15.0 2024-09-23 19:45:27,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2024-09-23 19:45:31,693 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=22.5 2024-09-23 19:45:38,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=332210.6666666667, ans=0.0 2024-09-23 19:46:29,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=332350.6666666667, ans=0.125 2024-09-23 19:46:35,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=332350.6666666667, ans=0.0 2024-09-23 19:46:39,723 INFO [train.py:1198] (3/4) Epoch 19, batch 1100, loss[loss=0.184, ctc_loss=0.1193, cr_loss=0.3234, over 17270.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1454, cr_loss=0.361, over 3348334.11 frames. ], batch size: 42, lr: 6.36e-03, grad_scale: 16.0 2024-09-23 19:46:43,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=332397.3333333333, ans=0.125 2024-09-23 19:47:19,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=332490.6666666667, ans=0.125 2024-09-23 19:47:30,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=332537.3333333333, ans=0.125 2024-09-23 19:47:43,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=332537.3333333333, ans=0.125 2024-09-23 19:47:54,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=332584.0, ans=0.125 2024-09-23 19:48:01,783 INFO [train.py:1198] (3/4) Epoch 19, batch 1150, loss[loss=0.2148, ctc_loss=0.1454, cr_loss=0.3467, over 17005.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1456, cr_loss=0.3612, over 3354203.64 frames. ], batch size: 51, lr: 6.36e-03, grad_scale: 16.0 2024-09-23 19:48:06,032 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.288e+02 1.429e+02 1.556e+02 3.211e+02, threshold=2.859e+02, percent-clipped=1.0 2024-09-23 19:48:35,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=332724.0, ans=0.125 2024-09-23 19:49:09,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332817.3333333333, ans=0.1 2024-09-23 19:49:18,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=332817.3333333333, ans=0.125 2024-09-23 19:49:24,950 INFO [train.py:1198] (3/4) Epoch 19, batch 1200, loss[loss=0.1937, ctc_loss=0.1269, cr_loss=0.3337, over 17238.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1451, cr_loss=0.3609, over 3352180.52 frames. ], batch size: 47, lr: 6.36e-03, grad_scale: 32.0 2024-09-23 19:49:38,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=332864.0, ans=0.125 2024-09-23 19:49:53,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=332910.6666666667, ans=22.5 2024-09-23 19:49:57,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=332957.3333333333, ans=0.125 2024-09-23 19:50:07,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=332957.3333333333, ans=0.0 2024-09-23 19:50:19,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=333004.0, ans=0.2 2024-09-23 19:50:21,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=22.5 2024-09-23 19:50:24,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=333004.0, ans=0.035 2024-09-23 19:50:26,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=333004.0, ans=0.0 2024-09-23 19:50:31,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=333050.6666666667, ans=0.125 2024-09-23 19:50:31,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=333050.6666666667, ans=15.0 2024-09-23 19:50:45,365 INFO [train.py:1198] (3/4) Epoch 19, batch 1250, loss[loss=0.2424, ctc_loss=0.1662, cr_loss=0.381, over 17012.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1454, cr_loss=0.3613, over 3353721.38 frames. ], batch size: 53, lr: 6.36e-03, grad_scale: 32.0 2024-09-23 19:50:46,878 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.324e+02 1.446e+02 1.574e+02 2.086e+02, threshold=2.891e+02, percent-clipped=0.0 2024-09-23 19:51:09,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=333144.0, ans=0.2 2024-09-23 19:51:26,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=333190.6666666667, ans=0.05 2024-09-23 19:51:51,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=333237.3333333333, ans=0.125 2024-09-23 19:51:59,911 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-23 19:52:02,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=333284.0, ans=0.0 2024-09-23 19:52:12,989 INFO [train.py:1198] (3/4) Epoch 19, batch 1300, loss[loss=0.2728, ctc_loss=0.1833, cr_loss=0.4477, over 16813.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1453, cr_loss=0.3611, over 3358839.40 frames. ], batch size: 61, lr: 6.35e-03, grad_scale: 16.0 2024-09-23 19:52:51,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=333424.0, ans=0.125 2024-09-23 19:53:15,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=333470.6666666667, ans=0.0 2024-09-23 19:53:26,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=333517.3333333333, ans=0.1 2024-09-23 19:53:35,916 INFO [train.py:1198] (3/4) Epoch 19, batch 1350, loss[loss=0.2174, ctc_loss=0.1421, cr_loss=0.3763, over 17271.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1447, cr_loss=0.3598, over 3361320.51 frames. ], batch size: 44, lr: 6.35e-03, grad_scale: 16.0 2024-09-23 19:53:39,093 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.258e+02 1.337e+02 1.461e+02 2.024e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-23 19:53:45,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=333564.0, ans=0.1 2024-09-23 19:53:49,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=15.0 2024-09-23 19:54:32,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=333704.0, ans=0.2 2024-09-23 19:54:34,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=333704.0, ans=0.2 2024-09-23 19:54:34,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=333704.0, ans=0.1 2024-09-23 19:54:47,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.92 vs. limit=10.0 2024-09-23 19:54:56,453 INFO [train.py:1198] (3/4) Epoch 19, batch 1400, loss[loss=0.1927, ctc_loss=0.1305, cr_loss=0.3107, over 17087.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1452, cr_loss=0.3608, over 3361889.82 frames. ], batch size: 43, lr: 6.35e-03, grad_scale: 16.0 2024-09-23 19:54:56,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=333797.3333333333, ans=0.125 2024-09-23 19:54:59,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=333797.3333333333, ans=0.025 2024-09-23 19:55:11,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.60 vs. limit=15.0 2024-09-23 19:56:15,968 INFO [train.py:1198] (3/4) Epoch 19, batch 1450, loss[loss=0.2175, ctc_loss=0.1436, cr_loss=0.3694, over 17255.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1446, cr_loss=0.3597, over 3371047.31 frames. ], batch size: 44, lr: 6.35e-03, grad_scale: 16.0 2024-09-23 19:56:16,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=334030.6666666667, ans=0.125 2024-09-23 19:56:21,632 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.283e+02 1.375e+02 1.499e+02 2.283e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-23 19:56:27,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2024-09-23 19:57:01,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=334124.0, ans=0.0 2024-09-23 19:57:01,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=334124.0, ans=0.125 2024-09-23 19:57:16,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=334170.6666666667, ans=0.0 2024-09-23 19:57:29,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=334217.3333333333, ans=0.0 2024-09-23 19:57:43,771 INFO [train.py:1198] (3/4) Epoch 19, batch 1500, loss[loss=0.225, ctc_loss=0.1527, cr_loss=0.3616, over 17046.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1454, cr_loss=0.3603, over 3361983.01 frames. ], batch size: 56, lr: 6.34e-03, grad_scale: 16.0 2024-09-23 19:57:47,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=334264.0, ans=0.0 2024-09-23 19:58:09,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=334310.6666666667, ans=0.0 2024-09-23 19:58:34,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=334404.0, ans=0.0 2024-09-23 19:58:36,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.46 vs. limit=15.0 2024-09-23 19:58:44,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=334404.0, ans=0.0 2024-09-23 19:58:53,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=334450.6666666667, ans=0.125 2024-09-23 19:59:06,201 INFO [train.py:1198] (3/4) Epoch 19, batch 1550, loss[loss=0.205, ctc_loss=0.1376, cr_loss=0.3368, over 17284.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1461, cr_loss=0.361, over 3365910.15 frames. ], batch size: 42, lr: 6.34e-03, grad_scale: 16.0 2024-09-23 19:59:09,456 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.039e+02 1.278e+02 1.387e+02 1.518e+02 2.007e+02, threshold=2.775e+02, percent-clipped=0.0 2024-09-23 19:59:43,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=334590.6666666667, ans=0.0 2024-09-23 19:59:48,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=334590.6666666667, ans=0.125 2024-09-23 19:59:59,929 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=12.0 2024-09-23 20:00:05,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=334637.3333333333, ans=0.125 2024-09-23 20:00:05,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=334637.3333333333, ans=0.1 2024-09-23 20:00:17,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=334684.0, ans=0.2 2024-09-23 20:00:26,300 INFO [train.py:1198] (3/4) Epoch 19, batch 1600, loss[loss=0.226, ctc_loss=0.1477, cr_loss=0.3913, over 17230.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1455, cr_loss=0.3605, over 3359530.42 frames. ], batch size: 50, lr: 6.34e-03, grad_scale: 32.0 2024-09-23 20:00:39,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-09-23 20:00:51,048 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5 2024-09-23 20:00:53,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=334777.3333333333, ans=0.125 2024-09-23 20:01:06,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=334824.0, ans=0.125 2024-09-23 20:01:21,853 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=22.5 2024-09-23 20:01:33,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=334917.3333333333, ans=0.2 2024-09-23 20:01:44,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=334917.3333333333, ans=0.2 2024-09-23 20:01:44,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=334917.3333333333, ans=0.125 2024-09-23 20:01:49,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=334964.0, ans=0.125 2024-09-23 20:01:50,762 INFO [train.py:1198] (3/4) Epoch 19, batch 1650, loss[loss=0.2465, ctc_loss=0.165, cr_loss=0.4078, over 17356.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1446, cr_loss=0.3587, over 3359492.23 frames. ], batch size: 48, lr: 6.34e-03, grad_scale: 32.0 2024-09-23 20:01:53,946 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.297e+02 1.356e+02 1.480e+02 2.099e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-23 20:02:38,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=335057.3333333333, ans=0.0 2024-09-23 20:02:53,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-09-23 20:02:54,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2024-09-23 20:03:16,143 INFO [train.py:1198] (3/4) Epoch 19, batch 1700, loss[loss=0.2273, ctc_loss=0.1516, cr_loss=0.3785, over 17023.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1443, cr_loss=0.3587, over 3361231.42 frames. ], batch size: 51, lr: 6.34e-03, grad_scale: 32.0 2024-09-23 20:03:24,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2024-09-23 20:03:28,982 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 20:03:32,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=335244.0, ans=0.125 2024-09-23 20:03:50,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=335290.6666666667, ans=0.2 2024-09-23 20:04:11,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-23 20:04:26,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=335384.0, ans=0.0 2024-09-23 20:04:33,384 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.89 vs. limit=22.5 2024-09-23 20:04:36,086 INFO [train.py:1198] (3/4) Epoch 19, batch 1750, loss[loss=0.2377, ctc_loss=0.1593, cr_loss=0.3921, over 16565.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1451, cr_loss=0.3597, over 3357382.10 frames. ], batch size: 66, lr: 6.33e-03, grad_scale: 32.0 2024-09-23 20:04:39,276 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.243e+02 1.345e+02 1.441e+02 1.973e+02, threshold=2.691e+02, percent-clipped=0.0 2024-09-23 20:04:42,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=335430.6666666667, ans=0.125 2024-09-23 20:04:48,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=335430.6666666667, ans=0.025 2024-09-23 20:04:57,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.92 vs. limit=10.0 2024-09-23 20:05:09,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=335524.0, ans=0.125 2024-09-23 20:05:14,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=335524.0, ans=0.2 2024-09-23 20:05:37,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2024-09-23 20:05:39,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=335617.3333333333, ans=0.025 2024-09-23 20:05:51,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=335617.3333333333, ans=0.0 2024-09-23 20:05:55,469 INFO [train.py:1198] (3/4) Epoch 19, batch 1800, loss[loss=0.206, ctc_loss=0.1396, cr_loss=0.3318, over 17282.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1458, cr_loss=0.3611, over 3356110.19 frames. ], batch size: 42, lr: 6.33e-03, grad_scale: 32.0 2024-09-23 20:05:57,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335664.0, ans=0.1 2024-09-23 20:06:28,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=335710.6666666667, ans=0.0 2024-09-23 20:06:56,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=335804.0, ans=0.125 2024-09-23 20:07:13,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=335850.6666666667, ans=0.125 2024-09-23 20:07:16,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=335850.6666666667, ans=0.5 2024-09-23 20:07:22,993 INFO [train.py:1198] (3/4) Epoch 19, batch 1850, loss[loss=0.2263, ctc_loss=0.1535, cr_loss=0.3636, over 17152.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1462, cr_loss=0.3616, over 3355307.88 frames. ], batch size: 45, lr: 6.33e-03, grad_scale: 32.0 2024-09-23 20:07:26,232 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.276e+02 1.381e+02 1.510e+02 2.241e+02, threshold=2.761e+02, percent-clipped=0.0 2024-09-23 20:07:34,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=335897.3333333333, ans=0.125 2024-09-23 20:07:37,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=335944.0, ans=0.125 2024-09-23 20:08:06,960 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 20:08:10,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.44 vs. limit=12.0 2024-09-23 20:08:32,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=336084.0, ans=0.025 2024-09-23 20:08:34,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=336084.0, ans=0.1 2024-09-23 20:08:34,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=336084.0, ans=0.125 2024-09-23 20:08:48,095 INFO [train.py:1198] (3/4) Epoch 19, batch 1900, loss[loss=0.1965, ctc_loss=0.1316, cr_loss=0.3247, over 17087.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1458, cr_loss=0.3612, over 3362898.73 frames. ], batch size: 43, lr: 6.33e-03, grad_scale: 32.0 2024-09-23 20:08:58,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-23 20:09:04,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=336177.3333333333, ans=0.035 2024-09-23 20:09:23,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=22.5 2024-09-23 20:09:43,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=336270.6666666667, ans=0.1 2024-09-23 20:10:07,582 INFO [train.py:1198] (3/4) Epoch 19, batch 1950, loss[loss=0.1826, ctc_loss=0.1177, cr_loss=0.3241, over 17232.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1458, cr_loss=0.3606, over 3352641.11 frames. ], batch size: 44, lr: 6.32e-03, grad_scale: 32.0 2024-09-23 20:10:10,844 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.232e+02 1.325e+02 1.435e+02 2.417e+02, threshold=2.650e+02, percent-clipped=0.0 2024-09-23 20:10:17,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=336364.0, ans=15.0 2024-09-23 20:10:25,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=336410.6666666667, ans=0.0 2024-09-23 20:11:05,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.91 vs. limit=22.5 2024-09-23 20:11:09,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=336504.0, ans=0.125 2024-09-23 20:11:31,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=336597.3333333333, ans=0.125 2024-09-23 20:11:33,082 INFO [train.py:1198] (3/4) Epoch 19, batch 2000, loss[loss=0.2108, ctc_loss=0.1378, cr_loss=0.3651, over 17302.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1463, cr_loss=0.3619, over 3355344.08 frames. ], batch size: 46, lr: 6.32e-03, grad_scale: 32.0 2024-09-23 20:11:33,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=336597.3333333333, ans=0.125 2024-09-23 20:11:55,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=336644.0, ans=0.125 2024-09-23 20:12:06,932 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.63 vs. limit=10.0 2024-09-23 20:12:09,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=336690.6666666667, ans=0.125 2024-09-23 20:12:15,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=336690.6666666667, ans=0.0 2024-09-23 20:12:30,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=336737.3333333333, ans=0.1 2024-09-23 20:12:55,372 INFO [train.py:1198] (3/4) Epoch 19, batch 2050, loss[loss=0.243, ctc_loss=0.1653, cr_loss=0.3886, over 17039.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1456, cr_loss=0.3608, over 3361500.10 frames. ], batch size: 52, lr: 6.32e-03, grad_scale: 32.0 2024-09-23 20:12:58,521 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.061e+02 1.236e+02 1.316e+02 1.436e+02 2.046e+02, threshold=2.631e+02, percent-clipped=0.0 2024-09-23 20:13:11,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=336830.6666666667, ans=0.125 2024-09-23 20:13:18,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=336877.3333333333, ans=0.125 2024-09-23 20:13:20,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=336877.3333333333, ans=0.125 2024-09-23 20:14:08,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=337017.3333333333, ans=0.125 2024-09-23 20:14:13,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=337017.3333333333, ans=0.025 2024-09-23 20:14:18,068 INFO [train.py:1198] (3/4) Epoch 19, batch 2100, loss[loss=0.185, ctc_loss=0.1224, cr_loss=0.3126, over 17189.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1456, cr_loss=0.3604, over 3355997.33 frames. ], batch size: 41, lr: 6.32e-03, grad_scale: 16.0 2024-09-23 20:14:55,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=337157.3333333333, ans=0.1 2024-09-23 20:14:58,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=337157.3333333333, ans=0.0 2024-09-23 20:15:00,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=12.0 2024-09-23 20:15:09,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=337204.0, ans=0.125 2024-09-23 20:15:10,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=337204.0, ans=0.125 2024-09-23 20:15:37,919 INFO [train.py:1198] (3/4) Epoch 19, batch 2150, loss[loss=0.2548, ctc_loss=0.1717, cr_loss=0.4158, over 16522.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1457, cr_loss=0.3614, over 3356043.62 frames. ], batch size: 66, lr: 6.32e-03, grad_scale: 8.0 2024-09-23 20:15:44,256 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.267e+02 1.340e+02 1.517e+02 2.263e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-23 20:16:26,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=337390.6666666667, ans=0.125 2024-09-23 20:17:05,773 INFO [train.py:1198] (3/4) Epoch 19, batch 2200, loss[loss=0.2297, ctc_loss=0.1568, cr_loss=0.3642, over 17019.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1451, cr_loss=0.36, over 3359541.05 frames. ], batch size: 53, lr: 6.31e-03, grad_scale: 8.0 2024-09-23 20:17:25,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=337577.3333333333, ans=0.0 2024-09-23 20:17:27,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.11 vs. limit=6.0 2024-09-23 20:17:34,014 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2024-09-23 20:17:35,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=8.76 vs. limit=22.5 2024-09-23 20:17:39,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=337624.0, ans=0.125 2024-09-23 20:17:45,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=337624.0, ans=0.025 2024-09-23 20:17:59,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=337670.6666666667, ans=0.0 2024-09-23 20:18:28,159 INFO [train.py:1198] (3/4) Epoch 19, batch 2250, loss[loss=0.1906, ctc_loss=0.1253, cr_loss=0.3261, over 17095.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.145, cr_loss=0.3598, over 3351440.42 frames. ], batch size: 43, lr: 6.31e-03, grad_scale: 8.0 2024-09-23 20:18:34,522 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.290e+02 1.402e+02 1.494e+02 5.352e+02, threshold=2.803e+02, percent-clipped=1.0 2024-09-23 20:18:44,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=337810.6666666667, ans=0.0 2024-09-23 20:18:55,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=337810.6666666667, ans=0.04949747468305833 2024-09-23 20:19:18,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=337904.0, ans=0.1 2024-09-23 20:19:33,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=337950.6666666667, ans=0.125 2024-09-23 20:19:48,406 INFO [train.py:1198] (3/4) Epoch 19, batch 2300, loss[loss=0.2232, ctc_loss=0.1497, cr_loss=0.3678, over 17356.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1453, cr_loss=0.36, over 3349578.49 frames. ], batch size: 48, lr: 6.31e-03, grad_scale: 8.0 2024-09-23 20:19:53,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337997.3333333333, ans=0.1 2024-09-23 20:20:27,631 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 20:20:55,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=338184.0, ans=0.125 2024-09-23 20:21:09,227 INFO [train.py:1198] (3/4) Epoch 19, batch 2350, loss[loss=0.2245, ctc_loss=0.1503, cr_loss=0.3708, over 17215.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1455, cr_loss=0.3605, over 3351799.10 frames. ], batch size: 47, lr: 6.31e-03, grad_scale: 8.0 2024-09-23 20:21:18,094 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.269e+02 1.353e+02 1.497e+02 2.252e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-23 20:21:43,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=338277.3333333333, ans=0.0 2024-09-23 20:21:48,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338324.0, ans=0.1 2024-09-23 20:21:55,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=12.0 2024-09-23 20:22:16,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=338370.6666666667, ans=0.125 2024-09-23 20:22:25,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=338417.3333333333, ans=0.125 2024-09-23 20:22:26,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=338417.3333333333, ans=10.0 2024-09-23 20:22:36,780 INFO [train.py:1198] (3/4) Epoch 19, batch 2400, loss[loss=0.2315, ctc_loss=0.1524, cr_loss=0.3954, over 17224.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.145, cr_loss=0.3593, over 3335994.36 frames. ], batch size: 55, lr: 6.31e-03, grad_scale: 16.0 2024-09-23 20:22:40,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=338464.0, ans=0.125 2024-09-23 20:23:16,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=338557.3333333333, ans=0.125 2024-09-23 20:23:21,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=22.5 2024-09-23 20:23:51,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=338650.6666666667, ans=0.125 2024-09-23 20:23:58,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338697.3333333333, ans=0.1 2024-09-23 20:23:59,489 INFO [train.py:1198] (3/4) Epoch 19, batch 2450, loss[loss=0.2202, ctc_loss=0.1455, cr_loss=0.3739, over 17181.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1448, cr_loss=0.3592, over 3334289.06 frames. ], batch size: 45, lr: 6.30e-03, grad_scale: 16.0 2024-09-23 20:24:01,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=338697.3333333333, ans=0.95 2024-09-23 20:24:05,811 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.266e+02 1.363e+02 1.497e+02 1.978e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-23 20:24:35,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.86 vs. limit=10.0 2024-09-23 20:24:37,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=338790.6666666667, ans=0.125 2024-09-23 20:24:55,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=338837.3333333333, ans=0.125 2024-09-23 20:25:19,142 INFO [train.py:1198] (3/4) Epoch 19, batch 2500, loss[loss=0.19, ctc_loss=0.1235, cr_loss=0.3327, over 17192.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1449, cr_loss=0.3594, over 3332479.83 frames. ], batch size: 41, lr: 6.30e-03, grad_scale: 16.0 2024-09-23 20:25:32,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=338930.6666666667, ans=0.125 2024-09-23 20:25:37,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338977.3333333333, ans=0.1 2024-09-23 20:25:57,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=339024.0, ans=0.025 2024-09-23 20:26:13,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=339070.6666666667, ans=0.1 2024-09-23 20:26:40,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=339117.3333333333, ans=0.0 2024-09-23 20:26:41,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.68 vs. limit=22.5 2024-09-23 20:26:43,707 INFO [train.py:1198] (3/4) Epoch 19, batch 2550, loss[loss=0.2281, ctc_loss=0.155, cr_loss=0.3655, over 17016.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1458, cr_loss=0.3618, over 3344359.17 frames. ], batch size: 52, lr: 6.30e-03, grad_scale: 16.0 2024-09-23 20:26:51,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=339164.0, ans=0.0 2024-09-23 20:26:52,589 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.236e+02 1.315e+02 1.425e+02 2.139e+02, threshold=2.630e+02, percent-clipped=0.0 2024-09-23 20:26:52,925 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 20:26:53,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-09-23 20:26:56,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=339164.0, ans=0.0 2024-09-23 20:26:59,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=339164.0, ans=0.025 2024-09-23 20:27:11,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=339210.6666666667, ans=0.125 2024-09-23 20:27:23,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=15.0 2024-09-23 20:27:35,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=339304.0, ans=0.125 2024-09-23 20:27:45,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=339304.0, ans=0.0 2024-09-23 20:27:56,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=339350.6666666667, ans=0.125 2024-09-23 20:27:56,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=339350.6666666667, ans=0.0 2024-09-23 20:28:08,359 INFO [train.py:1198] (3/4) Epoch 19, batch 2600, loss[loss=0.2281, ctc_loss=0.1525, cr_loss=0.3784, over 16887.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.146, cr_loss=0.3626, over 3340562.78 frames. ], batch size: 58, lr: 6.30e-03, grad_scale: 16.0 2024-09-23 20:28:14,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=339397.3333333333, ans=0.125 2024-09-23 20:28:18,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=22.5 2024-09-23 20:28:23,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=339444.0, ans=0.0 2024-09-23 20:28:42,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=339490.6666666667, ans=0.125 2024-09-23 20:28:53,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=339490.6666666667, ans=0.2 2024-09-23 20:29:15,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=339584.0, ans=0.125 2024-09-23 20:29:17,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=339584.0, ans=0.125 2024-09-23 20:29:27,317 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=22.5 2024-09-23 20:29:28,191 INFO [train.py:1198] (3/4) Epoch 19, batch 2650, loss[loss=0.2473, ctc_loss=0.1692, cr_loss=0.3902, over 15963.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1466, cr_loss=0.3632, over 3342052.99 frames. ], batch size: 74, lr: 6.29e-03, grad_scale: 16.0 2024-09-23 20:29:31,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=339630.6666666667, ans=0.125 2024-09-23 20:29:34,368 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.315e+02 1.464e+02 1.614e+02 2.211e+02, threshold=2.927e+02, percent-clipped=0.0 2024-09-23 20:29:50,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=339677.3333333333, ans=0.0 2024-09-23 20:30:48,090 INFO [train.py:1198] (3/4) Epoch 19, batch 2700, loss[loss=0.1965, ctc_loss=0.1284, cr_loss=0.3408, over 17260.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1472, cr_loss=0.3649, over 3347393.49 frames. ], batch size: 44, lr: 6.29e-03, grad_scale: 16.0 2024-09-23 20:31:16,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=339910.6666666667, ans=0.125 2024-09-23 20:31:27,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=339957.3333333333, ans=0.1 2024-09-23 20:31:27,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=339957.3333333333, ans=0.0 2024-09-23 20:32:16,075 INFO [train.py:1198] (3/4) Epoch 19, batch 2750, loss[loss=0.1816, ctc_loss=0.1186, cr_loss=0.3152, over 17258.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1471, cr_loss=0.3641, over 3347044.16 frames. ], batch size: 42, lr: 6.29e-03, grad_scale: 16.0 2024-09-23 20:32:22,222 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.266e+02 1.341e+02 1.484e+02 3.814e+02, threshold=2.681e+02, percent-clipped=1.0 2024-09-23 20:32:22,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=340097.3333333333, ans=0.125 2024-09-23 20:32:44,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340144.0, ans=0.1 2024-09-23 20:33:08,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-09-23 20:33:21,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=340284.0, ans=0.1 2024-09-23 20:33:26,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=340284.0, ans=0.0 2024-09-23 20:33:38,451 INFO [train.py:1198] (3/4) Epoch 19, batch 2800, loss[loss=0.2046, ctc_loss=0.1333, cr_loss=0.3564, over 17288.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1467, cr_loss=0.3631, over 3344102.56 frames. ], batch size: 46, lr: 6.29e-03, grad_scale: 32.0 2024-09-23 20:33:54,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=340377.3333333333, ans=0.125 2024-09-23 20:34:09,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-09-23 20:34:58,055 INFO [train.py:1198] (3/4) Epoch 19, batch 2850, loss[loss=0.2334, ctc_loss=0.1574, cr_loss=0.38, over 17033.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1471, cr_loss=0.3637, over 3354032.36 frames. ], batch size: 52, lr: 6.29e-03, grad_scale: 32.0 2024-09-23 20:35:04,576 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.252e+02 1.382e+02 1.524e+02 2.298e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-23 20:35:25,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=340610.6666666667, ans=0.5 2024-09-23 20:35:54,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=340704.0, ans=0.2 2024-09-23 20:36:18,558 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=22.5 2024-09-23 20:36:22,621 INFO [train.py:1198] (3/4) Epoch 19, batch 2900, loss[loss=0.2018, ctc_loss=0.134, cr_loss=0.3389, over 17090.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1478, cr_loss=0.3655, over 3348883.36 frames. ], batch size: 49, lr: 6.28e-03, grad_scale: 16.0 2024-09-23 20:36:57,675 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=22.5 2024-09-23 20:36:58,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=340890.6666666667, ans=0.125 2024-09-23 20:36:58,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=340890.6666666667, ans=0.2 2024-09-23 20:37:35,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=340984.0, ans=0.0 2024-09-23 20:37:47,692 INFO [train.py:1198] (3/4) Epoch 19, batch 2950, loss[loss=0.2874, ctc_loss=0.208, cr_loss=0.3969, over 11879.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1469, cr_loss=0.3633, over 3343060.13 frames. ], batch size: 123, lr: 6.28e-03, grad_scale: 16.0 2024-09-23 20:37:55,447 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.260e+02 1.402e+02 1.500e+02 2.241e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-23 20:38:21,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=341124.0, ans=0.2 2024-09-23 20:38:56,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=341217.3333333333, ans=0.1 2024-09-23 20:39:02,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=341217.3333333333, ans=0.125 2024-09-23 20:39:06,541 INFO [train.py:1198] (3/4) Epoch 19, batch 3000, loss[loss=0.197, ctc_loss=0.1295, cr_loss=0.3378, over 17056.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1463, cr_loss=0.3629, over 3343364.39 frames. ], batch size: 39, lr: 6.28e-03, grad_scale: 16.0 2024-09-23 20:39:06,541 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 20:39:17,487 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0078, 4.0206, 4.0566, 3.1347], device='cuda:3') 2024-09-23 20:39:21,850 INFO [train.py:1230] (3/4) Epoch 19, validation: loss=0.03984, ctc_loss=0.03984, cr_loss=8.01e-15, over 944034.00 frames. 2024-09-23 20:39:21,851 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 20:39:29,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=341264.0, ans=0.125 2024-09-23 20:39:34,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=341264.0, ans=0.07 2024-09-23 20:39:39,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2024-09-23 20:39:45,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=341310.6666666667, ans=0.125 2024-09-23 20:40:27,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.67 vs. limit=15.0 2024-09-23 20:40:40,307 INFO [train.py:1198] (3/4) Epoch 19, batch 3050, loss[loss=0.2031, ctc_loss=0.1348, cr_loss=0.3417, over 17286.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.146, cr_loss=0.362, over 3352463.05 frames. ], batch size: 51, lr: 6.28e-03, grad_scale: 16.0 2024-09-23 20:40:42,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=341497.3333333333, ans=0.125 2024-09-23 20:40:48,103 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.265e+02 1.361e+02 1.501e+02 2.045e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-23 20:40:52,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=341497.3333333333, ans=0.0 2024-09-23 20:41:10,683 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=22.5 2024-09-23 20:41:58,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=341730.6666666667, ans=0.1 2024-09-23 20:41:59,531 INFO [train.py:1198] (3/4) Epoch 19, batch 3100, loss[loss=0.2014, ctc_loss=0.1307, cr_loss=0.3537, over 17057.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1461, cr_loss=0.362, over 3352068.60 frames. ], batch size: 46, lr: 6.28e-03, grad_scale: 16.0 2024-09-23 20:42:06,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2024-09-23 20:42:16,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.63 vs. limit=22.5 2024-09-23 20:42:20,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=341777.3333333333, ans=0.125 2024-09-23 20:42:40,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=341824.0, ans=0.2 2024-09-23 20:42:48,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=341870.6666666667, ans=0.125 2024-09-23 20:42:48,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=341870.6666666667, ans=0.2 2024-09-23 20:43:20,315 INFO [train.py:1198] (3/4) Epoch 19, batch 3150, loss[loss=0.2586, ctc_loss=0.1804, cr_loss=0.3913, over 14851.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1455, cr_loss=0.3619, over 3360129.97 frames. ], batch size: 89, lr: 6.27e-03, grad_scale: 16.0 2024-09-23 20:43:20,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=341964.0, ans=0.1 2024-09-23 20:43:30,465 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.288e+02 1.385e+02 1.538e+02 2.696e+02, threshold=2.771e+02, percent-clipped=0.0 2024-09-23 20:43:38,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=342010.6666666667, ans=0.0 2024-09-23 20:43:38,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342010.6666666667, ans=0.1 2024-09-23 20:43:49,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-09-23 20:44:20,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=342104.0, ans=0.125 2024-09-23 20:44:35,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=342150.6666666667, ans=0.125 2024-09-23 20:44:44,044 INFO [train.py:1198] (3/4) Epoch 19, batch 3200, loss[loss=0.2094, ctc_loss=0.1409, cr_loss=0.3423, over 16789.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1447, cr_loss=0.3602, over 3357669.72 frames. ], batch size: 61, lr: 6.27e-03, grad_scale: 32.0 2024-09-23 20:44:49,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2024-09-23 20:45:21,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=342290.6666666667, ans=0.0 2024-09-23 20:45:34,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=342337.3333333333, ans=0.0 2024-09-23 20:46:01,867 INFO [train.py:1198] (3/4) Epoch 19, batch 3250, loss[loss=0.2608, ctc_loss=0.1844, cr_loss=0.3818, over 11675.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1446, cr_loss=0.36, over 3358405.92 frames. ], batch size: 123, lr: 6.27e-03, grad_scale: 16.0 2024-09-23 20:46:03,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=342430.6666666667, ans=0.2 2024-09-23 20:46:05,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=342430.6666666667, ans=0.07 2024-09-23 20:46:11,252 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.296e+02 1.376e+02 1.501e+02 2.697e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-23 20:46:19,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=342477.3333333333, ans=0.1 2024-09-23 20:46:47,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=342524.0, ans=0.0 2024-09-23 20:46:49,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=342570.6666666667, ans=0.035 2024-09-23 20:46:52,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=342570.6666666667, ans=0.0 2024-09-23 20:47:02,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=342570.6666666667, ans=0.2 2024-09-23 20:47:08,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=342617.3333333333, ans=0.125 2024-09-23 20:47:08,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=342617.3333333333, ans=0.125 2024-09-23 20:47:22,264 INFO [train.py:1198] (3/4) Epoch 19, batch 3300, loss[loss=0.226, ctc_loss=0.1512, cr_loss=0.3737, over 17306.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.146, cr_loss=0.3616, over 3353976.76 frames. ], batch size: 49, lr: 6.27e-03, grad_scale: 16.0 2024-09-23 20:48:04,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.98 vs. limit=12.0 2024-09-23 20:48:12,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.00 vs. limit=12.0 2024-09-23 20:48:14,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=342804.0, ans=0.125 2024-09-23 20:48:21,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=342804.0, ans=0.125 2024-09-23 20:48:41,115 INFO [train.py:1198] (3/4) Epoch 19, batch 3350, loss[loss=0.2187, ctc_loss=0.146, cr_loss=0.3638, over 17300.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.146, cr_loss=0.3622, over 3353533.13 frames. ], batch size: 49, lr: 6.26e-03, grad_scale: 16.0 2024-09-23 20:48:44,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=342897.3333333333, ans=0.125 2024-09-23 20:48:50,531 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.298e+02 1.381e+02 1.515e+02 2.027e+02, threshold=2.761e+02, percent-clipped=0.0 2024-09-23 20:49:01,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=342944.0, ans=0.2 2024-09-23 20:49:08,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=342944.0, ans=0.125 2024-09-23 20:49:17,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=342990.6666666667, ans=0.0 2024-09-23 20:49:59,674 INFO [train.py:1198] (3/4) Epoch 19, batch 3400, loss[loss=0.2328, ctc_loss=0.1555, cr_loss=0.3863, over 17183.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1465, cr_loss=0.3633, over 3353933.95 frames. ], batch size: 45, lr: 6.26e-03, grad_scale: 16.0 2024-09-23 20:50:00,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.65 vs. limit=15.0 2024-09-23 20:50:06,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=343130.6666666667, ans=0.125 2024-09-23 20:50:09,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=343130.6666666667, ans=0.1 2024-09-23 20:50:31,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=343224.0, ans=0.0 2024-09-23 20:50:45,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=343270.6666666667, ans=0.125 2024-09-23 20:51:13,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=343317.3333333333, ans=0.035 2024-09-23 20:51:17,934 INFO [train.py:1198] (3/4) Epoch 19, batch 3450, loss[loss=0.2322, ctc_loss=0.159, cr_loss=0.3659, over 17370.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1462, cr_loss=0.3628, over 3351732.44 frames. ], batch size: 48, lr: 6.26e-03, grad_scale: 16.0 2024-09-23 20:51:27,576 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.246e+02 1.361e+02 1.474e+02 2.072e+02, threshold=2.723e+02, percent-clipped=0.0 2024-09-23 20:51:43,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=343410.6666666667, ans=0.125 2024-09-23 20:51:49,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=343457.3333333333, ans=0.125 2024-09-23 20:52:00,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=343457.3333333333, ans=0.5 2024-09-23 20:52:36,232 INFO [train.py:1198] (3/4) Epoch 19, batch 3500, loss[loss=0.201, ctc_loss=0.1321, cr_loss=0.3446, over 17283.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1453, cr_loss=0.361, over 3355480.38 frames. ], batch size: 51, lr: 6.26e-03, grad_scale: 16.0 2024-09-23 20:52:50,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=343644.0, ans=0.0 2024-09-23 20:53:11,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=343690.6666666667, ans=0.125 2024-09-23 20:53:14,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=343690.6666666667, ans=0.0 2024-09-23 20:53:17,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=343690.6666666667, ans=0.125 2024-09-23 20:53:22,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=343690.6666666667, ans=0.125 2024-09-23 20:53:26,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=343737.3333333333, ans=0.125 2024-09-23 20:53:40,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=343784.0, ans=0.125 2024-09-23 20:53:56,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=343830.6666666667, ans=0.125 2024-09-23 20:53:56,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=343830.6666666667, ans=0.2 2024-09-23 20:53:57,648 INFO [train.py:1198] (3/4) Epoch 19, batch 3550, loss[loss=0.1879, ctc_loss=0.1231, cr_loss=0.3243, over 17238.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1449, cr_loss=0.361, over 3364179.96 frames. ], batch size: 42, lr: 6.26e-03, grad_scale: 16.0 2024-09-23 20:54:07,042 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.247e+02 1.326e+02 1.445e+02 2.020e+02, threshold=2.653e+02, percent-clipped=0.0 2024-09-23 20:54:18,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=343877.3333333333, ans=0.04949747468305833 2024-09-23 20:54:45,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=343970.6666666667, ans=0.1 2024-09-23 20:54:54,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=343970.6666666667, ans=0.02 2024-09-23 20:55:00,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=344017.3333333333, ans=0.0 2024-09-23 20:55:08,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2024-09-23 20:55:17,373 INFO [train.py:1198] (3/4) Epoch 19, batch 3600, loss[loss=0.2203, ctc_loss=0.1464, cr_loss=0.3693, over 17030.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1453, cr_loss=0.3622, over 3364341.26 frames. ], batch size: 52, lr: 6.25e-03, grad_scale: 32.0 2024-09-23 20:55:19,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=344064.0, ans=0.1 2024-09-23 20:55:31,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=344110.6666666667, ans=0.125 2024-09-23 20:55:43,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=22.5 2024-09-23 20:55:56,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=344157.3333333333, ans=0.025 2024-09-23 20:56:10,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=344204.0, ans=0.125 2024-09-23 20:56:27,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=344250.6666666667, ans=0.1 2024-09-23 20:56:30,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=344250.6666666667, ans=0.125 2024-09-23 20:56:33,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=344250.6666666667, ans=0.125 2024-09-23 20:56:36,803 INFO [train.py:1198] (3/4) Epoch 19, batch 3650, loss[loss=0.194, ctc_loss=0.1269, cr_loss=0.3353, over 17019.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1453, cr_loss=0.3617, over 3365912.64 frames. ], batch size: 39, lr: 6.25e-03, grad_scale: 32.0 2024-09-23 20:56:47,602 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.302e+02 1.379e+02 1.507e+02 2.534e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-23 20:56:57,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=344344.0, ans=0.0 2024-09-23 20:57:19,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=344390.6666666667, ans=0.025 2024-09-23 20:57:31,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.80 vs. limit=15.0 2024-09-23 20:57:33,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=344437.3333333333, ans=10.0 2024-09-23 20:57:34,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=344437.3333333333, ans=0.0 2024-09-23 20:57:47,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=22.5 2024-09-23 20:57:53,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=344484.0, ans=0.0 2024-09-23 20:57:54,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2024-09-23 20:57:56,275 INFO [train.py:1198] (3/4) Epoch 19, batch 3700, loss[loss=0.275, ctc_loss=0.197, cr_loss=0.3898, over 11797.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1446, cr_loss=0.3602, over 3353782.11 frames. ], batch size: 124, lr: 6.25e-03, grad_scale: 16.0 2024-09-23 20:57:56,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=344530.6666666667, ans=0.125 2024-09-23 20:58:21,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=344577.3333333333, ans=0.125 2024-09-23 20:58:32,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=344624.0, ans=0.0 2024-09-23 20:58:40,507 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 20:58:47,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-09-23 20:58:53,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=344670.6666666667, ans=0.125 2024-09-23 20:59:14,536 INFO [train.py:1198] (3/4) Epoch 19, batch 3750, loss[loss=0.2432, ctc_loss=0.1651, cr_loss=0.3905, over 16938.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1444, cr_loss=0.3593, over 3339839.91 frames. ], batch size: 58, lr: 6.25e-03, grad_scale: 16.0 2024-09-23 20:59:25,487 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.321e+02 1.412e+02 1.562e+02 2.185e+02, threshold=2.824e+02, percent-clipped=0.0 2024-09-23 21:00:05,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=12.0 2024-09-23 21:00:22,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.38 vs. limit=5.0 2024-09-23 21:00:32,423 INFO [train.py:1198] (3/4) Epoch 19, batch 3800, loss[loss=0.2482, ctc_loss=0.1679, cr_loss=0.4014, over 16935.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1459, cr_loss=0.36, over 3308225.81 frames. ], batch size: 58, lr: 6.25e-03, grad_scale: 16.0 2024-09-23 21:00:43,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=344997.3333333333, ans=0.2 2024-09-23 21:01:08,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=345090.6666666667, ans=0.125 2024-09-23 21:01:29,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-23 21:01:30,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=345137.3333333333, ans=0.07 2024-09-23 21:01:39,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2024-09-23 21:01:41,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=345184.0, ans=0.1 2024-09-23 21:01:47,214 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-09-23 21:01:49,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=345230.6666666667, ans=0.0 2024-09-23 21:01:50,704 INFO [train.py:1198] (3/4) Epoch 19, batch 3850, loss[loss=0.273, ctc_loss=0.1918, cr_loss=0.4064, over 12469.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1474, cr_loss=0.3608, over 3263597.66 frames. ], batch size: 123, lr: 6.24e-03, grad_scale: 16.0 2024-09-23 21:02:01,516 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.310e+02 1.460e+02 1.598e+02 2.355e+02, threshold=2.920e+02, percent-clipped=0.0 2024-09-23 21:02:14,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=345277.3333333333, ans=0.125 2024-09-23 21:02:27,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=12.0 2024-09-23 21:02:39,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=345370.6666666667, ans=0.125 2024-09-23 21:03:53,560 INFO [train.py:1198] (3/4) Epoch 20, batch 0, loss[loss=0.1873, ctc_loss=0.1208, cr_loss=0.3324, over 17067.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1208, cr_loss=0.3324, over 17067.00 frames. ], batch size: 39, lr: 6.08e-03, grad_scale: 32.0 2024-09-23 21:03:53,560 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 21:04:08,657 INFO [train.py:1230] (3/4) Epoch 20, validation: loss=0.03935, ctc_loss=0.03935, cr_loss=7.664e-15, over 944034.00 frames. 2024-09-23 21:04:08,658 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 21:04:14,244 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=15.0 2024-09-23 21:04:57,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=345538.6666666667, ans=0.1 2024-09-23 21:05:04,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=345585.3333333333, ans=0.125 2024-09-23 21:05:10,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=345585.3333333333, ans=0.0 2024-09-23 21:05:11,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=345585.3333333333, ans=0.125 2024-09-23 21:05:33,925 INFO [train.py:1198] (3/4) Epoch 20, batch 50, loss[loss=0.2028, ctc_loss=0.1343, cr_loss=0.3425, over 17260.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1441, cr_loss=0.3587, over 749656.29 frames. ], batch size: 42, lr: 6.08e-03, grad_scale: 32.0 2024-09-23 21:05:51,261 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.262e+02 1.457e+02 1.636e+02 2.185e+02, threshold=2.915e+02, percent-clipped=0.0 2024-09-23 21:05:57,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=345725.3333333333, ans=10.0 2024-09-23 21:06:28,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=345818.6666666667, ans=0.125 2024-09-23 21:06:42,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=345865.3333333333, ans=0.2 2024-09-23 21:06:51,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=345865.3333333333, ans=0.07 2024-09-23 21:06:53,843 INFO [train.py:1198] (3/4) Epoch 20, batch 100, loss[loss=0.2137, ctc_loss=0.1419, cr_loss=0.3591, over 17355.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.147, cr_loss=0.3634, over 1332853.71 frames. ], batch size: 48, lr: 6.08e-03, grad_scale: 32.0 2024-09-23 21:06:58,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=345912.0, ans=0.125 2024-09-23 21:07:16,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=345958.6666666667, ans=0.0 2024-09-23 21:07:18,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=345958.6666666667, ans=0.125 2024-09-23 21:07:23,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=345958.6666666667, ans=0.0 2024-09-23 21:07:37,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=346005.3333333333, ans=0.0 2024-09-23 21:07:46,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2024-09-23 21:08:02,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=346098.6666666667, ans=0.0 2024-09-23 21:08:18,776 INFO [train.py:1198] (3/4) Epoch 20, batch 150, loss[loss=0.2269, ctc_loss=0.1515, cr_loss=0.3769, over 17300.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1456, cr_loss=0.3623, over 1781026.34 frames. ], batch size: 51, lr: 6.07e-03, grad_scale: 16.0 2024-09-23 21:08:38,025 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.021e+02 1.321e+02 1.429e+02 1.602e+02 2.448e+02, threshold=2.858e+02, percent-clipped=0.0 2024-09-23 21:08:46,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.88 vs. limit=12.0 2024-09-23 21:09:17,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=346285.3333333333, ans=0.125 2024-09-23 21:09:45,101 INFO [train.py:1198] (3/4) Epoch 20, batch 200, loss[loss=0.2073, ctc_loss=0.1343, cr_loss=0.3647, over 17025.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1455, cr_loss=0.3626, over 2132026.68 frames. ], batch size: 44, lr: 6.07e-03, grad_scale: 16.0 2024-09-23 21:09:45,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=346378.6666666667, ans=0.0 2024-09-23 21:10:05,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=346425.3333333333, ans=0.2 2024-09-23 21:10:30,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.94 vs. limit=15.0 2024-09-23 21:10:31,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=346518.6666666667, ans=0.0 2024-09-23 21:10:39,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=346518.6666666667, ans=0.0 2024-09-23 21:10:42,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=346518.6666666667, ans=0.0 2024-09-23 21:10:47,653 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 21:10:47,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=346565.3333333333, ans=0.125 2024-09-23 21:11:04,973 INFO [train.py:1198] (3/4) Epoch 20, batch 250, loss[loss=0.2164, ctc_loss=0.1462, cr_loss=0.3512, over 17232.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1443, cr_loss=0.3608, over 2415859.52 frames. ], batch size: 50, lr: 6.07e-03, grad_scale: 16.0 2024-09-23 21:11:23,904 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.297e+02 1.428e+02 1.617e+02 1.854e+02, threshold=2.857e+02, percent-clipped=0.0 2024-09-23 21:11:31,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=346658.6666666667, ans=0.1 2024-09-23 21:12:24,541 INFO [train.py:1198] (3/4) Epoch 20, batch 300, loss[loss=0.2426, ctc_loss=0.1641, cr_loss=0.3927, over 16894.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.144, cr_loss=0.3611, over 2636396.31 frames. ], batch size: 58, lr: 6.07e-03, grad_scale: 16.0 2024-09-23 21:12:24,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=346845.3333333333, ans=0.1 2024-09-23 21:12:27,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.88 vs. limit=22.5 2024-09-23 21:12:28,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=346845.3333333333, ans=0.0 2024-09-23 21:12:32,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=346845.3333333333, ans=0.05 2024-09-23 21:12:51,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=346892.0, ans=0.025 2024-09-23 21:13:07,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=346938.6666666667, ans=0.025 2024-09-23 21:13:32,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=347032.0, ans=0.125 2024-09-23 21:13:49,666 INFO [train.py:1198] (3/4) Epoch 20, batch 350, loss[loss=0.2028, ctc_loss=0.1351, cr_loss=0.3382, over 17303.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1442, cr_loss=0.3618, over 2801325.07 frames. ], batch size: 51, lr: 6.07e-03, grad_scale: 16.0 2024-09-23 21:13:56,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=347078.6666666667, ans=0.125 2024-09-23 21:13:58,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2024-09-23 21:14:08,666 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.225e+02 1.312e+02 1.421e+02 1.795e+02, threshold=2.625e+02, percent-clipped=0.0 2024-09-23 21:14:12,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2024-09-23 21:14:17,701 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.96 vs. limit=10.0 2024-09-23 21:14:20,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=347172.0, ans=0.035 2024-09-23 21:14:45,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=347218.6666666667, ans=0.025 2024-09-23 21:15:12,643 INFO [train.py:1198] (3/4) Epoch 20, batch 400, loss[loss=0.2375, ctc_loss=0.1611, cr_loss=0.3818, over 17293.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1444, cr_loss=0.3615, over 2912630.01 frames. ], batch size: 51, lr: 6.06e-03, grad_scale: 32.0 2024-09-23 21:15:30,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=347358.6666666667, ans=0.05 2024-09-23 21:15:38,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=347358.6666666667, ans=0.2 2024-09-23 21:16:20,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2024-09-23 21:16:31,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=347545.3333333333, ans=0.1 2024-09-23 21:16:32,403 INFO [train.py:1198] (3/4) Epoch 20, batch 450, loss[loss=0.2272, ctc_loss=0.1535, cr_loss=0.3681, over 17150.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1443, cr_loss=0.3606, over 3001788.56 frames. ], batch size: 48, lr: 6.06e-03, grad_scale: 32.0 2024-09-23 21:16:42,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=347545.3333333333, ans=0.125 2024-09-23 21:16:52,900 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.293e+02 1.421e+02 1.585e+02 2.168e+02, threshold=2.842e+02, percent-clipped=0.0 2024-09-23 21:17:48,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=347732.0, ans=0.1 2024-09-23 21:17:51,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=347732.0, ans=0.1 2024-09-23 21:17:51,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=347732.0, ans=0.2 2024-09-23 21:17:54,869 INFO [train.py:1198] (3/4) Epoch 20, batch 500, loss[loss=0.2065, ctc_loss=0.1412, cr_loss=0.3265, over 16922.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1438, cr_loss=0.3601, over 3080930.37 frames. ], batch size: 42, lr: 6.06e-03, grad_scale: 16.0 2024-09-23 21:18:15,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.54 vs. limit=10.0 2024-09-23 21:18:16,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=347825.3333333333, ans=0.0 2024-09-23 21:18:41,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=347872.0, ans=0.1 2024-09-23 21:19:17,193 INFO [train.py:1198] (3/4) Epoch 20, batch 550, loss[loss=0.2087, ctc_loss=0.1391, cr_loss=0.3478, over 17025.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.144, cr_loss=0.3597, over 3138180.32 frames. ], batch size: 51, lr: 6.06e-03, grad_scale: 16.0 2024-09-23 21:19:42,497 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.238e+02 1.328e+02 1.432e+02 2.472e+02, threshold=2.656e+02, percent-clipped=0.0 2024-09-23 21:19:52,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=348105.3333333333, ans=0.2 2024-09-23 21:20:07,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=348105.3333333333, ans=0.0 2024-09-23 21:20:41,716 INFO [train.py:1198] (3/4) Epoch 20, batch 600, loss[loss=0.2391, ctc_loss=0.1621, cr_loss=0.3852, over 16718.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.144, cr_loss=0.3603, over 3187673.45 frames. ], batch size: 61, lr: 6.06e-03, grad_scale: 16.0 2024-09-23 21:20:49,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=348245.3333333333, ans=0.125 2024-09-23 21:20:56,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=348292.0, ans=0.125 2024-09-23 21:21:08,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=348292.0, ans=0.125 2024-09-23 21:21:12,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=348338.6666666667, ans=0.1 2024-09-23 21:21:34,789 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=12.0 2024-09-23 21:22:01,014 INFO [train.py:1198] (3/4) Epoch 20, batch 650, loss[loss=0.1709, ctc_loss=0.1109, cr_loss=0.3001, over 17096.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1434, cr_loss=0.3591, over 3222017.65 frames. ], batch size: 40, lr: 6.05e-03, grad_scale: 16.0 2024-09-23 21:22:15,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=348525.3333333333, ans=0.125 2024-09-23 21:22:21,604 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.264e+02 1.348e+02 1.477e+02 2.156e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-23 21:22:21,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=348525.3333333333, ans=0.0 2024-09-23 21:22:24,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=348525.3333333333, ans=0.125 2024-09-23 21:22:32,047 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2024-09-23 21:23:19,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.86 vs. limit=12.0 2024-09-23 21:23:26,231 INFO [train.py:1198] (3/4) Epoch 20, batch 700, loss[loss=0.2319, ctc_loss=0.1551, cr_loss=0.3842, over 17016.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1438, cr_loss=0.3596, over 3251385.40 frames. ], batch size: 51, lr: 6.05e-03, grad_scale: 16.0 2024-09-23 21:23:48,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=348758.6666666667, ans=0.2 2024-09-23 21:23:49,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=348758.6666666667, ans=0.025 2024-09-23 21:23:49,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2024-09-23 21:24:08,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=348805.3333333333, ans=0.125 2024-09-23 21:24:16,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=348852.0, ans=0.0 2024-09-23 21:24:17,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=348852.0, ans=0.1 2024-09-23 21:24:49,320 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.39 vs. limit=10.0 2024-09-23 21:24:51,551 INFO [train.py:1198] (3/4) Epoch 20, batch 750, loss[loss=0.2069, ctc_loss=0.1372, cr_loss=0.3484, over 17318.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1441, cr_loss=0.3602, over 3281037.13 frames. ], batch size: 51, lr: 6.05e-03, grad_scale: 16.0 2024-09-23 21:25:12,208 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.238e+02 1.344e+02 1.442e+02 2.138e+02, threshold=2.688e+02, percent-clipped=0.0 2024-09-23 21:25:12,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=348992.0, ans=0.125 2024-09-23 21:25:28,761 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=12.0 2024-09-23 21:25:31,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=349038.6666666667, ans=0.0 2024-09-23 21:25:50,963 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 21:25:59,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-09-23 21:26:09,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=349178.6666666667, ans=0.125 2024-09-23 21:26:11,143 INFO [train.py:1198] (3/4) Epoch 20, batch 800, loss[loss=0.2201, ctc_loss=0.1433, cr_loss=0.3841, over 17097.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1444, cr_loss=0.3607, over 3291479.21 frames. ], batch size: 49, lr: 6.05e-03, grad_scale: 32.0 2024-09-23 21:26:45,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=349272.0, ans=0.125 2024-09-23 21:26:48,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=349272.0, ans=0.0 2024-09-23 21:27:08,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=349318.6666666667, ans=0.2 2024-09-23 21:27:12,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=349318.6666666667, ans=0.125 2024-09-23 21:27:12,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=349318.6666666667, ans=0.125 2024-09-23 21:27:31,183 INFO [train.py:1198] (3/4) Epoch 20, batch 850, loss[loss=0.2586, ctc_loss=0.1862, cr_loss=0.3617, over 11736.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1437, cr_loss=0.3597, over 3308789.75 frames. ], batch size: 123, lr: 6.05e-03, grad_scale: 32.0 2024-09-23 21:27:54,463 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.268e+02 1.363e+02 1.494e+02 2.147e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-23 21:27:54,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=349458.6666666667, ans=0.2 2024-09-23 21:28:30,975 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2024-09-23 21:28:36,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=349552.0, ans=0.1 2024-09-23 21:28:56,006 INFO [train.py:1198] (3/4) Epoch 20, batch 900, loss[loss=0.2012, ctc_loss=0.1318, cr_loss=0.3474, over 17001.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1439, cr_loss=0.3599, over 3330566.47 frames. ], batch size: 44, lr: 6.04e-03, grad_scale: 32.0 2024-09-23 21:29:01,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=349645.3333333333, ans=0.125 2024-09-23 21:29:07,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=349645.3333333333, ans=0.125 2024-09-23 21:29:12,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=349692.0, ans=0.0 2024-09-23 21:29:18,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=349692.0, ans=0.2 2024-09-23 21:29:43,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=22.5 2024-09-23 21:29:46,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2024-09-23 21:30:05,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=349832.0, ans=0.125 2024-09-23 21:30:21,168 INFO [train.py:1198] (3/4) Epoch 20, batch 950, loss[loss=0.1945, ctc_loss=0.1266, cr_loss=0.3395, over 17074.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1432, cr_loss=0.3587, over 3342087.41 frames. ], batch size: 46, lr: 6.04e-03, grad_scale: 32.0 2024-09-23 21:30:42,123 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.279e+02 1.392e+02 1.519e+02 1.959e+02, threshold=2.784e+02, percent-clipped=0.0 2024-09-23 21:31:41,388 INFO [train.py:1198] (3/4) Epoch 20, batch 1000, loss[loss=0.2846, ctc_loss=0.2052, cr_loss=0.3971, over 11861.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.144, cr_loss=0.3595, over 3325393.56 frames. ], batch size: 123, lr: 6.04e-03, grad_scale: 32.0 2024-09-23 21:31:41,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=350112.0, ans=0.0 2024-09-23 21:31:48,414 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.48 vs. limit=10.0 2024-09-23 21:31:49,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=350112.0, ans=0.125 2024-09-23 21:32:03,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=350158.6666666667, ans=0.125 2024-09-23 21:32:12,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=350205.3333333333, ans=0.125 2024-09-23 21:32:46,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=350252.0, ans=0.2 2024-09-23 21:32:55,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=350298.6666666667, ans=0.125 2024-09-23 21:33:04,841 INFO [train.py:1198] (3/4) Epoch 20, batch 1050, loss[loss=0.2571, ctc_loss=0.1715, cr_loss=0.4278, over 17196.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1438, cr_loss=0.36, over 3342507.42 frames. ], batch size: 55, lr: 6.04e-03, grad_scale: 32.0 2024-09-23 21:33:27,843 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.295e+02 1.374e+02 1.534e+02 2.501e+02, threshold=2.748e+02, percent-clipped=0.0 2024-09-23 21:33:29,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.37 vs. limit=22.5 2024-09-23 21:33:42,357 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 21:33:47,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=350438.6666666667, ans=0.1 2024-09-23 21:33:55,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=350485.3333333333, ans=0.125 2024-09-23 21:34:17,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2024-09-23 21:34:22,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=350532.0, ans=0.1 2024-09-23 21:34:26,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.74 vs. limit=10.0 2024-09-23 21:34:29,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=350532.0, ans=0.125 2024-09-23 21:34:32,268 INFO [train.py:1198] (3/4) Epoch 20, batch 1100, loss[loss=0.2671, ctc_loss=0.1813, cr_loss=0.4291, over 15195.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1433, cr_loss=0.3587, over 3341887.99 frames. ], batch size: 89, lr: 6.04e-03, grad_scale: 16.0 2024-09-23 21:34:38,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=350578.6666666667, ans=0.125 2024-09-23 21:35:01,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=350625.3333333333, ans=0.125 2024-09-23 21:35:39,859 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=22.5 2024-09-23 21:35:47,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=350765.3333333333, ans=0.0 2024-09-23 21:35:52,150 INFO [train.py:1198] (3/4) Epoch 20, batch 1150, loss[loss=0.2075, ctc_loss=0.14, cr_loss=0.3377, over 17029.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1425, cr_loss=0.3566, over 3342273.25 frames. ], batch size: 51, lr: 6.03e-03, grad_scale: 16.0 2024-09-23 21:35:57,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=350812.0, ans=0.125 2024-09-23 21:36:14,527 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.209e+02 1.298e+02 1.420e+02 2.385e+02, threshold=2.595e+02, percent-clipped=0.0 2024-09-23 21:36:30,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=350905.3333333333, ans=0.125 2024-09-23 21:36:50,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=350952.0, ans=0.125 2024-09-23 21:36:51,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=350952.0, ans=0.0 2024-09-23 21:36:55,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=350998.6666666667, ans=0.125 2024-09-23 21:37:12,177 INFO [train.py:1198] (3/4) Epoch 20, batch 1200, loss[loss=0.2158, ctc_loss=0.1466, cr_loss=0.3461, over 17231.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1429, cr_loss=0.3569, over 3335540.43 frames. ], batch size: 50, lr: 6.03e-03, grad_scale: 32.0 2024-09-23 21:37:51,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=351138.6666666667, ans=0.125 2024-09-23 21:38:07,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=351185.3333333333, ans=0.125 2024-09-23 21:38:22,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.01 vs. limit=12.0 2024-09-23 21:38:34,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=351232.0, ans=0.125 2024-09-23 21:38:34,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=351232.0, ans=0.125 2024-09-23 21:38:37,552 INFO [train.py:1198] (3/4) Epoch 20, batch 1250, loss[loss=0.2069, ctc_loss=0.1417, cr_loss=0.326, over 17016.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1431, cr_loss=0.3569, over 3333387.31 frames. ], batch size: 51, lr: 6.03e-03, grad_scale: 32.0 2024-09-23 21:38:39,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=351278.6666666667, ans=0.0 2024-09-23 21:38:56,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=351325.3333333333, ans=0.125 2024-09-23 21:38:59,801 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.265e+02 1.387e+02 1.557e+02 1.898e+02, threshold=2.775e+02, percent-clipped=0.0 2024-09-23 21:39:00,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=351325.3333333333, ans=0.125 2024-09-23 21:39:04,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=351325.3333333333, ans=0.025 2024-09-23 21:39:27,642 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 21:39:27,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=351372.0, ans=0.2 2024-09-23 21:39:29,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=351418.6666666667, ans=0.2 2024-09-23 21:39:44,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=351465.3333333333, ans=0.0 2024-09-23 21:40:02,004 INFO [train.py:1198] (3/4) Epoch 20, batch 1300, loss[loss=0.2414, ctc_loss=0.1623, cr_loss=0.3952, over 17291.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1427, cr_loss=0.357, over 3345151.54 frames. ], batch size: 46, lr: 6.03e-03, grad_scale: 32.0 2024-09-23 21:40:27,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=351558.6666666667, ans=0.125 2024-09-23 21:40:43,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=351605.3333333333, ans=0.09899494936611666 2024-09-23 21:40:45,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=351605.3333333333, ans=0.125 2024-09-23 21:40:48,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=351652.0, ans=0.125 2024-09-23 21:40:58,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=351652.0, ans=0.2 2024-09-23 21:41:01,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=351652.0, ans=0.125 2024-09-23 21:41:09,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=351698.6666666667, ans=0.1 2024-09-23 21:41:21,503 INFO [train.py:1198] (3/4) Epoch 20, batch 1350, loss[loss=0.2445, ctc_loss=0.1668, cr_loss=0.3887, over 16902.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1436, cr_loss=0.3584, over 3352404.57 frames. ], batch size: 58, lr: 6.03e-03, grad_scale: 32.0 2024-09-23 21:41:32,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=351745.3333333333, ans=0.5 2024-09-23 21:41:43,587 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.053e+02 1.285e+02 1.356e+02 1.494e+02 3.061e+02, threshold=2.711e+02, percent-clipped=1.0 2024-09-23 21:41:44,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=351792.0, ans=0.2 2024-09-23 21:41:58,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=351838.6666666667, ans=0.1 2024-09-23 21:42:14,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=351885.3333333333, ans=0.125 2024-09-23 21:42:39,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=351932.0, ans=0.0 2024-09-23 21:42:43,487 INFO [train.py:1198] (3/4) Epoch 20, batch 1400, loss[loss=0.1979, ctc_loss=0.131, cr_loss=0.3344, over 17015.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1436, cr_loss=0.3588, over 3355283.96 frames. ], batch size: 56, lr: 6.02e-03, grad_scale: 32.0 2024-09-23 21:42:48,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=351978.6666666667, ans=0.125 2024-09-23 21:42:56,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=351978.6666666667, ans=0.2 2024-09-23 21:43:25,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.93 vs. limit=22.5 2024-09-23 21:44:03,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=352165.3333333333, ans=0.125 2024-09-23 21:44:08,217 INFO [train.py:1198] (3/4) Epoch 20, batch 1450, loss[loss=0.2334, ctc_loss=0.1567, cr_loss=0.3836, over 17345.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1442, cr_loss=0.3601, over 3360114.26 frames. ], batch size: 48, lr: 6.02e-03, grad_scale: 32.0 2024-09-23 21:44:12,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=352212.0, ans=0.125 2024-09-23 21:44:27,536 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.04 vs. limit=22.5 2024-09-23 21:44:30,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=352258.6666666667, ans=0.125 2024-09-23 21:44:32,963 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.276e+02 1.364e+02 1.479e+02 2.768e+02, threshold=2.727e+02, percent-clipped=1.0 2024-09-23 21:44:57,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=352352.0, ans=0.125 2024-09-23 21:45:01,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=352352.0, ans=0.5 2024-09-23 21:45:08,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=352352.0, ans=0.0 2024-09-23 21:45:17,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=352398.6666666667, ans=0.125 2024-09-23 21:45:30,177 INFO [train.py:1198] (3/4) Epoch 20, batch 1500, loss[loss=0.2156, ctc_loss=0.1423, cr_loss=0.3667, over 17213.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1435, cr_loss=0.3581, over 3361991.05 frames. ], batch size: 55, lr: 6.02e-03, grad_scale: 16.0 2024-09-23 21:45:32,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=12.0 2024-09-23 21:45:54,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=352492.0, ans=0.125 2024-09-23 21:46:21,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=352585.3333333333, ans=0.025 2024-09-23 21:46:37,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.09 vs. limit=10.0 2024-09-23 21:46:49,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=352678.6666666667, ans=0.05 2024-09-23 21:46:51,137 INFO [train.py:1198] (3/4) Epoch 20, batch 1550, loss[loss=0.2262, ctc_loss=0.1502, cr_loss=0.3797, over 16947.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1441, cr_loss=0.3592, over 3361286.40 frames. ], batch size: 58, lr: 6.02e-03, grad_scale: 16.0 2024-09-23 21:47:12,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=352725.3333333333, ans=0.125 2024-09-23 21:47:17,720 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.019e+02 1.276e+02 1.387e+02 1.513e+02 2.213e+02, threshold=2.774e+02, percent-clipped=0.0 2024-09-23 21:47:19,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=352725.3333333333, ans=0.2 2024-09-23 21:47:22,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=352725.3333333333, ans=0.0 2024-09-23 21:47:24,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=352772.0, ans=0.05 2024-09-23 21:47:38,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=352772.0, ans=0.0 2024-09-23 21:47:39,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=352772.0, ans=15.0 2024-09-23 21:48:05,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=352865.3333333333, ans=0.125 2024-09-23 21:48:16,281 INFO [train.py:1198] (3/4) Epoch 20, batch 1600, loss[loss=0.1969, ctc_loss=0.1275, cr_loss=0.347, over 16983.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1439, cr_loss=0.359, over 3358873.23 frames. ], batch size: 42, lr: 6.02e-03, grad_scale: 32.0 2024-09-23 21:48:43,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=22.5 2024-09-23 21:48:48,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=353005.3333333333, ans=0.125 2024-09-23 21:49:00,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.13 vs. limit=10.0 2024-09-23 21:49:01,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=353005.3333333333, ans=0.125 2024-09-23 21:49:18,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=353052.0, ans=0.0 2024-09-23 21:49:40,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.10 vs. limit=15.0 2024-09-23 21:49:41,004 INFO [train.py:1198] (3/4) Epoch 20, batch 1650, loss[loss=0.1884, ctc_loss=0.1248, cr_loss=0.3179, over 17068.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1442, cr_loss=0.3599, over 3358377.06 frames. ], batch size: 40, lr: 6.02e-03, grad_scale: 32.0 2024-09-23 21:49:49,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=353145.3333333333, ans=0.125 2024-09-23 21:50:05,309 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.288e+02 1.374e+02 1.505e+02 2.586e+02, threshold=2.748e+02, percent-clipped=0.0 2024-09-23 21:50:09,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=353192.0, ans=15.0 2024-09-23 21:50:40,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=353285.3333333333, ans=0.04949747468305833 2024-09-23 21:50:47,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=353332.0, ans=0.0 2024-09-23 21:50:54,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=353332.0, ans=0.2 2024-09-23 21:51:01,008 INFO [train.py:1198] (3/4) Epoch 20, batch 1700, loss[loss=0.2281, ctc_loss=0.1542, cr_loss=0.3694, over 16751.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1454, cr_loss=0.3614, over 3348345.19 frames. ], batch size: 61, lr: 6.01e-03, grad_scale: 32.0 2024-09-23 21:51:02,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=353378.6666666667, ans=0.125 2024-09-23 21:51:19,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=353425.3333333333, ans=0.5 2024-09-23 21:51:28,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.15 vs. limit=15.0 2024-09-23 21:51:40,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-09-23 21:52:02,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353518.6666666667, ans=0.1 2024-09-23 21:52:17,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=353565.3333333333, ans=0.0 2024-09-23 21:52:24,265 INFO [train.py:1198] (3/4) Epoch 20, batch 1750, loss[loss=0.2404, ctc_loss=0.157, cr_loss=0.4171, over 16744.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1453, cr_loss=0.3615, over 3349462.52 frames. ], batch size: 61, lr: 6.01e-03, grad_scale: 16.0 2024-09-23 21:52:49,736 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.235e+02 1.321e+02 1.434e+02 2.366e+02, threshold=2.642e+02, percent-clipped=0.0 2024-09-23 21:53:09,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=353705.3333333333, ans=0.125 2024-09-23 21:53:47,294 INFO [train.py:1198] (3/4) Epoch 20, batch 1800, loss[loss=0.2438, ctc_loss=0.1626, cr_loss=0.4061, over 17256.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1445, cr_loss=0.3601, over 3351598.69 frames. ], batch size: 44, lr: 6.01e-03, grad_scale: 16.0 2024-09-23 21:54:01,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=353845.3333333333, ans=0.2 2024-09-23 21:54:58,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=354032.0, ans=0.2 2024-09-23 21:55:12,168 INFO [train.py:1198] (3/4) Epoch 20, batch 1850, loss[loss=0.2352, ctc_loss=0.1578, cr_loss=0.3868, over 16446.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1446, cr_loss=0.361, over 3356963.27 frames. ], batch size: 66, lr: 6.01e-03, grad_scale: 16.0 2024-09-23 21:55:37,928 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.237e+02 1.317e+02 1.410e+02 2.955e+02, threshold=2.633e+02, percent-clipped=1.0 2024-09-23 21:55:49,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=354172.0, ans=0.2 2024-09-23 21:55:53,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2024-09-23 21:55:53,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.58 vs. limit=15.0 2024-09-23 21:55:54,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=354172.0, ans=0.125 2024-09-23 21:56:16,838 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2024-09-23 21:56:32,458 INFO [train.py:1198] (3/4) Epoch 20, batch 1900, loss[loss=0.215, ctc_loss=0.1403, cr_loss=0.3736, over 17253.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1443, cr_loss=0.3614, over 3370453.16 frames. ], batch size: 44, lr: 6.01e-03, grad_scale: 16.0 2024-09-23 21:56:56,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354358.6666666667, ans=0.1 2024-09-23 21:57:39,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354498.6666666667, ans=0.1 2024-09-23 21:57:42,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.98 vs. limit=12.0 2024-09-23 21:57:55,488 INFO [train.py:1198] (3/4) Epoch 20, batch 1950, loss[loss=0.2344, ctc_loss=0.161, cr_loss=0.3673, over 17306.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1441, cr_loss=0.3607, over 3367936.38 frames. ], batch size: 51, lr: 6.00e-03, grad_scale: 16.0 2024-09-23 21:58:06,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=354545.3333333333, ans=0.2 2024-09-23 21:58:23,509 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.283e+02 1.409e+02 1.574e+02 2.318e+02, threshold=2.818e+02, percent-clipped=0.0 2024-09-23 21:58:57,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=354685.3333333333, ans=15.0 2024-09-23 21:59:21,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=354732.0, ans=0.125 2024-09-23 21:59:25,720 INFO [train.py:1198] (3/4) Epoch 20, batch 2000, loss[loss=0.1669, ctc_loss=0.1093, cr_loss=0.2879, over 16245.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1439, cr_loss=0.3603, over 3364274.67 frames. ], batch size: 36, lr: 6.00e-03, grad_scale: 32.0 2024-09-23 21:59:30,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=354778.6666666667, ans=0.125 2024-09-23 22:00:07,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=354872.0, ans=0.025 2024-09-23 22:00:18,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=354918.6666666667, ans=0.2 2024-09-23 22:00:37,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=354965.3333333333, ans=0.125 2024-09-23 22:00:45,864 INFO [train.py:1198] (3/4) Epoch 20, batch 2050, loss[loss=0.1934, ctc_loss=0.1321, cr_loss=0.3063, over 17184.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1438, cr_loss=0.3605, over 3367746.81 frames. ], batch size: 41, lr: 6.00e-03, grad_scale: 32.0 2024-09-23 22:00:54,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=355012.0, ans=0.0 2024-09-23 22:00:59,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=355012.0, ans=0.125 2024-09-23 22:01:00,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=355058.6666666667, ans=0.0 2024-09-23 22:01:05,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=355058.6666666667, ans=0.0 2024-09-23 22:01:11,577 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.282e+02 1.359e+02 1.463e+02 2.527e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-23 22:01:16,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=355105.3333333333, ans=0.2 2024-09-23 22:01:21,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=355105.3333333333, ans=0.125 2024-09-23 22:01:30,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=355105.3333333333, ans=0.125 2024-09-23 22:02:05,992 INFO [train.py:1198] (3/4) Epoch 20, batch 2100, loss[loss=0.2234, ctc_loss=0.1482, cr_loss=0.3759, over 17032.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1443, cr_loss=0.3617, over 3369792.66 frames. ], batch size: 56, lr: 6.00e-03, grad_scale: 32.0 2024-09-23 22:02:11,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=355245.3333333333, ans=0.0 2024-09-23 22:02:14,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=355245.3333333333, ans=0.125 2024-09-23 22:02:45,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=355338.6666666667, ans=0.1 2024-09-23 22:02:50,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2024-09-23 22:03:24,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=355432.0, ans=0.0 2024-09-23 22:03:30,234 INFO [train.py:1198] (3/4) Epoch 20, batch 2150, loss[loss=0.202, ctc_loss=0.134, cr_loss=0.34, over 17275.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1434, cr_loss=0.3594, over 3366953.37 frames. ], batch size: 46, lr: 6.00e-03, grad_scale: 32.0 2024-09-23 22:03:32,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=355478.6666666667, ans=0.125 2024-09-23 22:03:37,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=355478.6666666667, ans=0.0 2024-09-23 22:03:37,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.94 vs. limit=10.0 2024-09-23 22:03:41,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2024-09-23 22:03:58,363 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.263e+02 1.377e+02 1.523e+02 2.016e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-23 22:04:03,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=355572.0, ans=0.0 2024-09-23 22:04:13,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=355572.0, ans=0.0 2024-09-23 22:04:21,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=355572.0, ans=0.125 2024-09-23 22:04:25,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=22.5 2024-09-23 22:04:51,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=22.5 2024-09-23 22:04:55,856 INFO [train.py:1198] (3/4) Epoch 20, batch 2200, loss[loss=0.2295, ctc_loss=0.1571, cr_loss=0.3619, over 16049.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1435, cr_loss=0.3599, over 3360982.78 frames. ], batch size: 74, lr: 5.99e-03, grad_scale: 32.0 2024-09-23 22:05:21,933 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.510e-03 2024-09-23 22:05:53,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=355852.0, ans=0.125 2024-09-23 22:06:16,057 INFO [train.py:1198] (3/4) Epoch 20, batch 2250, loss[loss=0.2212, ctc_loss=0.1471, cr_loss=0.3701, over 17017.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1427, cr_loss=0.3584, over 3368159.42 frames. ], batch size: 51, lr: 5.99e-03, grad_scale: 32.0 2024-09-23 22:06:16,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=355945.3333333333, ans=0.05 2024-09-23 22:06:16,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=355945.3333333333, ans=0.125 2024-09-23 22:06:27,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=355945.3333333333, ans=10.0 2024-09-23 22:06:32,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=355992.0, ans=0.125 2024-09-23 22:06:32,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=355992.0, ans=0.025 2024-09-23 22:06:34,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2024-09-23 22:06:41,714 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.265e+02 1.369e+02 1.505e+02 1.904e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-23 22:06:45,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=355992.0, ans=0.125 2024-09-23 22:07:36,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2024-09-23 22:07:38,560 INFO [train.py:1198] (3/4) Epoch 20, batch 2300, loss[loss=0.2389, ctc_loss=0.1583, cr_loss=0.4027, over 16906.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1429, cr_loss=0.3586, over 3353567.71 frames. ], batch size: 58, lr: 5.99e-03, grad_scale: 32.0 2024-09-23 22:07:46,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=356178.6666666667, ans=0.0 2024-09-23 22:08:08,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.10 vs. limit=22.5 2024-09-23 22:08:11,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=356272.0, ans=0.125 2024-09-23 22:08:11,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=356272.0, ans=0.125 2024-09-23 22:08:44,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=356365.3333333333, ans=0.0 2024-09-23 22:08:56,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=356365.3333333333, ans=0.5 2024-09-23 22:09:02,584 INFO [train.py:1198] (3/4) Epoch 20, batch 2350, loss[loss=0.2164, ctc_loss=0.1429, cr_loss=0.3672, over 16836.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1428, cr_loss=0.3587, over 3366007.64 frames. ], batch size: 58, lr: 5.99e-03, grad_scale: 32.0 2024-09-23 22:09:15,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=356412.0, ans=0.125 2024-09-23 22:09:19,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=356458.6666666667, ans=0.07 2024-09-23 22:09:20,119 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 22:09:24,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356458.6666666667, ans=0.1 2024-09-23 22:09:30,701 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.236e+02 1.335e+02 1.500e+02 2.118e+02, threshold=2.671e+02, percent-clipped=0.0 2024-09-23 22:09:31,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=356458.6666666667, ans=0.125 2024-09-23 22:09:53,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=356552.0, ans=0.2 2024-09-23 22:09:57,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=12.0 2024-09-23 22:10:17,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=356598.6666666667, ans=0.125 2024-09-23 22:10:24,949 INFO [train.py:1198] (3/4) Epoch 20, batch 2400, loss[loss=0.2402, ctc_loss=0.1634, cr_loss=0.3842, over 17111.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1437, cr_loss=0.3598, over 3352341.18 frames. ], batch size: 49, lr: 5.99e-03, grad_scale: 32.0 2024-09-23 22:10:38,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=356645.3333333333, ans=0.0 2024-09-23 22:10:44,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-09-23 22:10:50,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=356692.0, ans=0.07 2024-09-23 22:11:11,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=356785.3333333333, ans=0.125 2024-09-23 22:11:11,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=356785.3333333333, ans=0.0 2024-09-23 22:11:31,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356832.0, ans=0.1 2024-09-23 22:11:44,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=356878.6666666667, ans=6.0 2024-09-23 22:11:45,256 INFO [train.py:1198] (3/4) Epoch 20, batch 2450, loss[loss=0.1982, ctc_loss=0.1293, cr_loss=0.3444, over 17058.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1445, cr_loss=0.3609, over 3350692.33 frames. ], batch size: 46, lr: 5.98e-03, grad_scale: 32.0 2024-09-23 22:11:50,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356878.6666666667, ans=0.1 2024-09-23 22:11:55,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=356878.6666666667, ans=0.025 2024-09-23 22:12:08,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=356925.3333333333, ans=0.2 2024-09-23 22:12:10,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=356925.3333333333, ans=0.125 2024-09-23 22:12:13,315 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.240e+02 1.349e+02 1.469e+02 2.826e+02, threshold=2.697e+02, percent-clipped=1.0 2024-09-23 22:12:49,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=357018.6666666667, ans=15.0 2024-09-23 22:12:49,994 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.22 vs. limit=5.0 2024-09-23 22:13:10,059 INFO [train.py:1198] (3/4) Epoch 20, batch 2500, loss[loss=0.2564, ctc_loss=0.1752, cr_loss=0.4062, over 15793.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.145, cr_loss=0.3618, over 3329410.07 frames. ], batch size: 74, lr: 5.98e-03, grad_scale: 32.0 2024-09-23 22:13:29,499 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 22:13:37,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=357158.6666666667, ans=0.0 2024-09-23 22:14:02,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=357252.0, ans=0.1 2024-09-23 22:14:22,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=357298.6666666667, ans=0.0 2024-09-23 22:14:34,665 INFO [train.py:1198] (3/4) Epoch 20, batch 2550, loss[loss=0.1895, ctc_loss=0.1231, cr_loss=0.3322, over 17049.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1444, cr_loss=0.3615, over 3345140.35 frames. ], batch size: 46, lr: 5.98e-03, grad_scale: 32.0 2024-09-23 22:14:51,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=357392.0, ans=0.1 2024-09-23 22:14:56,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=15.0 2024-09-23 22:15:00,262 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.269e+02 1.372e+02 1.542e+02 2.100e+02, threshold=2.744e+02, percent-clipped=0.0 2024-09-23 22:15:02,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=357392.0, ans=0.025 2024-09-23 22:15:42,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=357532.0, ans=0.125 2024-09-23 22:15:53,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=357578.6666666667, ans=0.2 2024-09-23 22:15:54,810 INFO [train.py:1198] (3/4) Epoch 20, batch 2600, loss[loss=0.2509, ctc_loss=0.1686, cr_loss=0.4115, over 17033.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1441, cr_loss=0.3606, over 3353489.65 frames. ], batch size: 51, lr: 5.98e-03, grad_scale: 32.0 2024-09-23 22:15:56,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=357578.6666666667, ans=0.09899494936611666 2024-09-23 22:16:02,035 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-09-23 22:16:26,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=357672.0, ans=0.125 2024-09-23 22:16:29,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=357672.0, ans=0.125 2024-09-23 22:16:49,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=357718.6666666667, ans=0.2 2024-09-23 22:16:53,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=357718.6666666667, ans=22.5 2024-09-23 22:16:54,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=357718.6666666667, ans=0.0 2024-09-23 22:17:03,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=357765.3333333333, ans=0.125 2024-09-23 22:17:17,932 INFO [train.py:1198] (3/4) Epoch 20, batch 2650, loss[loss=0.2396, ctc_loss=0.1595, cr_loss=0.4005, over 17289.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1446, cr_loss=0.362, over 3356280.26 frames. ], batch size: 51, lr: 5.98e-03, grad_scale: 32.0 2024-09-23 22:17:35,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=357858.6666666667, ans=0.125 2024-09-23 22:17:41,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.71 vs. limit=15.0 2024-09-23 22:17:43,557 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.263e+02 1.370e+02 1.503e+02 2.130e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-23 22:17:43,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=357858.6666666667, ans=0.125 2024-09-23 22:17:59,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=357905.3333333333, ans=0.125 2024-09-23 22:18:05,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=357905.3333333333, ans=0.125 2024-09-23 22:18:31,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=357998.6666666667, ans=0.0 2024-09-23 22:18:41,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=358045.3333333333, ans=0.2 2024-09-23 22:18:43,119 INFO [train.py:1198] (3/4) Epoch 20, batch 2700, loss[loss=0.2339, ctc_loss=0.1582, cr_loss=0.3787, over 15140.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.145, cr_loss=0.3624, over 3350681.41 frames. ], batch size: 89, lr: 5.97e-03, grad_scale: 32.0 2024-09-23 22:19:22,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.56 vs. limit=15.0 2024-09-23 22:19:33,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.98 vs. limit=15.0 2024-09-23 22:19:38,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=358185.3333333333, ans=0.0 2024-09-23 22:19:59,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=358232.0, ans=0.125 2024-09-23 22:20:02,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=358232.0, ans=0.125 2024-09-23 22:20:05,466 INFO [train.py:1198] (3/4) Epoch 20, batch 2750, loss[loss=0.2229, ctc_loss=0.1493, cr_loss=0.368, over 17230.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1435, cr_loss=0.3599, over 3349304.73 frames. ], batch size: 50, lr: 5.97e-03, grad_scale: 32.0 2024-09-23 22:20:26,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=358325.3333333333, ans=0.125 2024-09-23 22:20:31,065 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.266e+02 1.360e+02 1.482e+02 2.958e+02, threshold=2.720e+02, percent-clipped=1.0 2024-09-23 22:20:39,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=358372.0, ans=0.0 2024-09-23 22:20:50,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=358372.0, ans=0.125 2024-09-23 22:21:25,622 INFO [train.py:1198] (3/4) Epoch 20, batch 2800, loss[loss=0.216, ctc_loss=0.1449, cr_loss=0.3555, over 16864.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1429, cr_loss=0.3583, over 3350212.19 frames. ], batch size: 58, lr: 5.97e-03, grad_scale: 32.0 2024-09-23 22:21:29,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=358512.0, ans=0.2 2024-09-23 22:22:17,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=358652.0, ans=0.125 2024-09-23 22:22:25,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=358652.0, ans=0.125 2024-09-23 22:22:47,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=358698.6666666667, ans=0.125 2024-09-23 22:22:50,493 INFO [train.py:1198] (3/4) Epoch 20, batch 2850, loss[loss=0.2448, ctc_loss=0.1672, cr_loss=0.3875, over 16620.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1429, cr_loss=0.3593, over 3358546.50 frames. ], batch size: 66, lr: 5.97e-03, grad_scale: 32.0 2024-09-23 22:22:52,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=358745.3333333333, ans=0.0 2024-09-23 22:23:00,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=358745.3333333333, ans=0.025 2024-09-23 22:23:09,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=358792.0, ans=0.125 2024-09-23 22:23:15,987 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.246e+02 1.349e+02 1.409e+02 2.188e+02, threshold=2.698e+02, percent-clipped=0.0 2024-09-23 22:23:16,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=358792.0, ans=0.125 2024-09-23 22:24:12,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=358932.0, ans=0.0 2024-09-23 22:24:15,306 INFO [train.py:1198] (3/4) Epoch 20, batch 2900, loss[loss=0.2203, ctc_loss=0.1453, cr_loss=0.3749, over 17026.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1431, cr_loss=0.3597, over 3359887.67 frames. ], batch size: 56, lr: 5.97e-03, grad_scale: 32.0 2024-09-23 22:24:20,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=358978.6666666667, ans=0.125 2024-09-23 22:24:52,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359072.0, ans=0.1 2024-09-23 22:25:00,738 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 22:25:18,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=359165.3333333333, ans=0.125 2024-09-23 22:25:35,526 INFO [train.py:1198] (3/4) Epoch 20, batch 2950, loss[loss=0.2291, ctc_loss=0.1559, cr_loss=0.3656, over 17343.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1439, cr_loss=0.3605, over 3354057.58 frames. ], batch size: 48, lr: 5.96e-03, grad_scale: 32.0 2024-09-23 22:25:35,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=359212.0, ans=0.125 2024-09-23 22:25:49,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=359258.6666666667, ans=0.1 2024-09-23 22:26:00,733 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.258e+02 1.349e+02 1.512e+02 2.191e+02, threshold=2.699e+02, percent-clipped=0.0 2024-09-23 22:26:04,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359258.6666666667, ans=0.1 2024-09-23 22:26:20,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=359305.3333333333, ans=0.125 2024-09-23 22:26:20,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=359305.3333333333, ans=0.0 2024-09-23 22:26:30,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2024-09-23 22:26:37,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=359398.6666666667, ans=0.0 2024-09-23 22:26:42,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=359398.6666666667, ans=0.125 2024-09-23 22:26:54,937 INFO [train.py:1198] (3/4) Epoch 20, batch 3000, loss[loss=0.2114, ctc_loss=0.1385, cr_loss=0.3644, over 17213.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1438, cr_loss=0.3594, over 3341721.34 frames. ], batch size: 47, lr: 5.96e-03, grad_scale: 32.0 2024-09-23 22:26:54,937 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 22:27:10,605 INFO [train.py:1230] (3/4) Epoch 20, validation: loss=0.03912, ctc_loss=0.03912, cr_loss=8.309e-15, over 944034.00 frames. 2024-09-23 22:27:10,606 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 22:27:20,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=359445.3333333333, ans=0.025 2024-09-23 22:27:32,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=359492.0, ans=0.125 2024-09-23 22:27:46,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359538.6666666667, ans=0.1 2024-09-23 22:27:53,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359538.6666666667, ans=0.1 2024-09-23 22:27:56,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=359585.3333333333, ans=0.2 2024-09-23 22:28:02,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=359585.3333333333, ans=0.0 2024-09-23 22:28:04,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=359585.3333333333, ans=0.125 2024-09-23 22:28:19,672 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 22:28:31,109 INFO [train.py:1198] (3/4) Epoch 20, batch 3050, loss[loss=0.2556, ctc_loss=0.172, cr_loss=0.4177, over 17042.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1432, cr_loss=0.3587, over 3352790.56 frames. ], batch size: 56, lr: 5.96e-03, grad_scale: 32.0 2024-09-23 22:28:35,142 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2024-09-23 22:28:56,199 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.297e+02 1.389e+02 1.525e+02 3.485e+02, threshold=2.778e+02, percent-clipped=1.0 2024-09-23 22:29:04,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359772.0, ans=0.1 2024-09-23 22:29:49,414 INFO [train.py:1198] (3/4) Epoch 20, batch 3100, loss[loss=0.1887, ctc_loss=0.1198, cr_loss=0.3444, over 17023.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1433, cr_loss=0.3588, over 3350154.28 frames. ], batch size: 39, lr: 5.96e-03, grad_scale: 32.0 2024-09-23 22:29:49,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=359912.0, ans=0.125 2024-09-23 22:29:55,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=359912.0, ans=0.125 2024-09-23 22:30:26,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=360005.3333333333, ans=0.125 2024-09-23 22:30:54,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=360098.6666666667, ans=0.125 2024-09-23 22:30:57,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=360098.6666666667, ans=0.125 2024-09-23 22:30:59,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=360098.6666666667, ans=0.125 2024-09-23 22:31:12,471 INFO [train.py:1198] (3/4) Epoch 20, batch 3150, loss[loss=0.2331, ctc_loss=0.1554, cr_loss=0.3884, over 17030.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1438, cr_loss=0.3598, over 3357698.58 frames. ], batch size: 52, lr: 5.96e-03, grad_scale: 32.0 2024-09-23 22:31:37,610 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.291e+02 1.400e+02 1.624e+02 2.307e+02, threshold=2.800e+02, percent-clipped=0.0 2024-09-23 22:31:47,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=360238.6666666667, ans=0.0 2024-09-23 22:31:49,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=360238.6666666667, ans=0.1 2024-09-23 22:32:14,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=360332.0, ans=0.0 2024-09-23 22:32:31,336 INFO [train.py:1198] (3/4) Epoch 20, batch 3200, loss[loss=0.1873, ctc_loss=0.1245, cr_loss=0.3141, over 17265.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1447, cr_loss=0.361, over 3348088.17 frames. ], batch size: 42, lr: 5.95e-03, grad_scale: 32.0 2024-09-23 22:33:01,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=15.0 2024-09-23 22:33:34,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=360565.3333333333, ans=0.2 2024-09-23 22:33:34,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=360565.3333333333, ans=0.125 2024-09-23 22:33:41,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=360565.3333333333, ans=0.125 2024-09-23 22:33:49,553 INFO [train.py:1198] (3/4) Epoch 20, batch 3250, loss[loss=0.2214, ctc_loss=0.154, cr_loss=0.337, over 16811.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1445, cr_loss=0.36, over 3336746.65 frames. ], batch size: 61, lr: 5.95e-03, grad_scale: 32.0 2024-09-23 22:33:58,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2024-09-23 22:34:16,119 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.218e+02 1.307e+02 1.462e+02 2.065e+02, threshold=2.615e+02, percent-clipped=0.0 2024-09-23 22:34:35,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=360752.0, ans=0.125 2024-09-23 22:35:03,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=360798.6666666667, ans=0.125 2024-09-23 22:35:07,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2024-09-23 22:35:08,071 INFO [train.py:1198] (3/4) Epoch 20, batch 3300, loss[loss=0.2559, ctc_loss=0.1701, cr_loss=0.429, over 16875.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1448, cr_loss=0.3599, over 3337345.30 frames. ], batch size: 58, lr: 5.95e-03, grad_scale: 32.0 2024-09-23 22:35:14,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=360845.3333333333, ans=0.125 2024-09-23 22:35:45,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=360938.6666666667, ans=0.0 2024-09-23 22:35:52,400 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=22.5 2024-09-23 22:35:53,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=360985.3333333333, ans=0.1 2024-09-23 22:35:59,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2024-09-23 22:36:12,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-23 22:36:26,329 INFO [train.py:1198] (3/4) Epoch 20, batch 3350, loss[loss=0.2218, ctc_loss=0.1506, cr_loss=0.3558, over 17155.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1454, cr_loss=0.361, over 3335849.60 frames. ], batch size: 48, lr: 5.95e-03, grad_scale: 16.0 2024-09-23 22:36:27,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=22.5 2024-09-23 22:36:32,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=361078.6666666667, ans=0.125 2024-09-23 22:36:32,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=361078.6666666667, ans=0.2 2024-09-23 22:36:56,430 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.040e+02 1.260e+02 1.354e+02 1.461e+02 2.333e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-23 22:37:01,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=361172.0, ans=0.025 2024-09-23 22:37:18,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=361218.6666666667, ans=0.125 2024-09-23 22:37:20,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.07 vs. limit=12.0 2024-09-23 22:37:46,672 INFO [train.py:1198] (3/4) Epoch 20, batch 3400, loss[loss=0.241, ctc_loss=0.1601, cr_loss=0.4044, over 17216.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1447, cr_loss=0.3608, over 3342245.75 frames. ], batch size: 55, lr: 5.95e-03, grad_scale: 16.0 2024-09-23 22:37:46,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=361312.0, ans=0.125 2024-09-23 22:37:50,384 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.41 vs. limit=10.0 2024-09-23 22:38:14,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=361358.6666666667, ans=0.0 2024-09-23 22:38:35,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=12.0 2024-09-23 22:38:37,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=361452.0, ans=0.125 2024-09-23 22:38:42,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=361452.0, ans=0.2 2024-09-23 22:39:06,007 INFO [train.py:1198] (3/4) Epoch 20, batch 3450, loss[loss=0.1463, ctc_loss=0.09291, cr_loss=0.2671, over 17026.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1437, cr_loss=0.3592, over 3346543.77 frames. ], batch size: 39, lr: 5.95e-03, grad_scale: 16.0 2024-09-23 22:39:09,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=361545.3333333333, ans=0.025 2024-09-23 22:39:14,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=12.0 2024-09-23 22:39:17,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=361545.3333333333, ans=0.09899494936611666 2024-09-23 22:39:17,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=361545.3333333333, ans=0.125 2024-09-23 22:39:34,087 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.266e+02 1.362e+02 1.520e+02 1.983e+02, threshold=2.723e+02, percent-clipped=0.0 2024-09-23 22:39:38,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2024-09-23 22:40:04,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=361685.3333333333, ans=0.125 2024-09-23 22:40:07,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=361732.0, ans=0.1 2024-09-23 22:40:16,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=361732.0, ans=0.0 2024-09-23 22:40:24,483 INFO [train.py:1198] (3/4) Epoch 20, batch 3500, loss[loss=0.1784, ctc_loss=0.1176, cr_loss=0.3039, over 16765.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1434, cr_loss=0.3591, over 3356514.96 frames. ], batch size: 37, lr: 5.94e-03, grad_scale: 16.0 2024-09-23 22:40:31,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=361778.6666666667, ans=0.07 2024-09-23 22:40:37,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=361778.6666666667, ans=0.02 2024-09-23 22:40:45,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=361825.3333333333, ans=0.125 2024-09-23 22:40:56,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=361872.0, ans=0.125 2024-09-23 22:41:01,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=361872.0, ans=0.125 2024-09-23 22:41:22,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2024-09-23 22:41:33,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=361965.3333333333, ans=0.125 2024-09-23 22:41:41,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=361965.3333333333, ans=0.1 2024-09-23 22:41:47,262 INFO [train.py:1198] (3/4) Epoch 20, batch 3550, loss[loss=0.2168, ctc_loss=0.1451, cr_loss=0.3582, over 17150.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1436, cr_loss=0.3596, over 3354703.00 frames. ], batch size: 48, lr: 5.94e-03, grad_scale: 16.0 2024-09-23 22:41:49,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=362012.0, ans=0.0 2024-09-23 22:42:09,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=12.0 2024-09-23 22:42:14,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=362058.6666666667, ans=0.125 2024-09-23 22:42:15,394 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.278e+02 1.363e+02 1.489e+02 4.233e+02, threshold=2.726e+02, percent-clipped=1.0 2024-09-23 22:42:47,755 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2024-09-23 22:42:50,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=362198.6666666667, ans=0.05 2024-09-23 22:42:58,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=362198.6666666667, ans=0.125 2024-09-23 22:42:59,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=362198.6666666667, ans=0.125 2024-09-23 22:43:05,645 INFO [train.py:1198] (3/4) Epoch 20, batch 3600, loss[loss=0.2363, ctc_loss=0.1567, cr_loss=0.3981, over 17040.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1441, cr_loss=0.3606, over 3356869.50 frames. ], batch size: 52, lr: 5.94e-03, grad_scale: 32.0 2024-09-23 22:44:23,803 INFO [train.py:1198] (3/4) Epoch 20, batch 3650, loss[loss=0.1961, ctc_loss=0.1317, cr_loss=0.3216, over 17104.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1444, cr_loss=0.3615, over 3363329.24 frames. ], batch size: 49, lr: 5.94e-03, grad_scale: 32.0 2024-09-23 22:44:27,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=362478.6666666667, ans=0.125 2024-09-23 22:44:47,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=362525.3333333333, ans=0.125 2024-09-23 22:44:51,968 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.248e+02 1.353e+02 1.463e+02 2.228e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-23 22:44:53,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.94 vs. limit=15.0 2024-09-23 22:45:06,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=362572.0, ans=0.125 2024-09-23 22:45:13,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=362618.6666666667, ans=0.2 2024-09-23 22:45:28,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=362665.3333333333, ans=0.125 2024-09-23 22:45:33,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=362665.3333333333, ans=0.0 2024-09-23 22:45:42,631 INFO [train.py:1198] (3/4) Epoch 20, batch 3700, loss[loss=0.2201, ctc_loss=0.1468, cr_loss=0.3662, over 17291.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1444, cr_loss=0.3609, over 3359912.34 frames. ], batch size: 46, lr: 5.94e-03, grad_scale: 32.0 2024-09-23 22:45:47,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=362712.0, ans=0.2 2024-09-23 22:45:47,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=362712.0, ans=0.0 2024-09-23 22:45:49,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=362712.0, ans=0.2 2024-09-23 22:45:50,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=362712.0, ans=0.05 2024-09-23 22:46:03,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=362758.6666666667, ans=15.0 2024-09-23 22:46:37,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.45 vs. limit=22.5 2024-09-23 22:46:38,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=362852.0, ans=0.0 2024-09-23 22:46:52,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=362898.6666666667, ans=0.0 2024-09-23 22:47:01,722 INFO [train.py:1198] (3/4) Epoch 20, batch 3750, loss[loss=0.2092, ctc_loss=0.1397, cr_loss=0.3476, over 17000.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1453, cr_loss=0.3618, over 3336735.48 frames. ], batch size: 51, lr: 5.93e-03, grad_scale: 32.0 2024-09-23 22:47:30,494 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.286e+02 1.371e+02 1.493e+02 3.473e+02, threshold=2.742e+02, percent-clipped=1.0 2024-09-23 22:47:43,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=363038.6666666667, ans=0.1 2024-09-23 22:47:46,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=363038.6666666667, ans=0.125 2024-09-23 22:47:50,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=363085.3333333333, ans=0.2 2024-09-23 22:47:51,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363085.3333333333, ans=0.1 2024-09-23 22:47:51,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=363085.3333333333, ans=0.07 2024-09-23 22:48:07,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=363132.0, ans=0.125 2024-09-23 22:48:09,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=363132.0, ans=0.025 2024-09-23 22:48:15,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.56 vs. limit=15.0 2024-09-23 22:48:18,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=363132.0, ans=0.125 2024-09-23 22:48:22,098 INFO [train.py:1198] (3/4) Epoch 20, batch 3800, loss[loss=0.1745, ctc_loss=0.1144, cr_loss=0.3007, over 16341.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1449, cr_loss=0.3606, over 3313959.62 frames. ], batch size: 36, lr: 5.93e-03, grad_scale: 32.0 2024-09-23 22:48:30,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=22.5 2024-09-23 22:48:36,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=363225.3333333333, ans=0.0 2024-09-23 22:48:49,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=363225.3333333333, ans=0.125 2024-09-23 22:49:08,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.59 vs. limit=15.0 2024-09-23 22:49:29,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=363365.3333333333, ans=0.025 2024-09-23 22:49:31,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=363365.3333333333, ans=0.0 2024-09-23 22:49:31,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-09-23 22:49:40,057 INFO [train.py:1198] (3/4) Epoch 20, batch 3850, loss[loss=0.2005, ctc_loss=0.1297, cr_loss=0.354, over 16313.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1448, cr_loss=0.3584, over 3270810.50 frames. ], batch size: 36, lr: 5.93e-03, grad_scale: 32.0 2024-09-23 22:49:40,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=363412.0, ans=0.125 2024-09-23 22:50:08,338 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.324e+02 1.457e+02 1.626e+02 2.908e+02, threshold=2.914e+02, percent-clipped=1.0 2024-09-23 22:50:48,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=363598.6666666667, ans=0.125 2024-09-23 22:51:41,132 INFO [train.py:1198] (3/4) Epoch 21, batch 0, loss[loss=0.1804, ctc_loss=0.1169, cr_loss=0.3175, over 17257.00 frames. ], tot_loss[loss=0.1804, ctc_loss=0.1169, cr_loss=0.3175, over 17257.00 frames. ], batch size: 42, lr: 5.78e-03, grad_scale: 32.0 2024-09-23 22:51:41,133 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-23 22:51:57,010 INFO [train.py:1230] (3/4) Epoch 21, validation: loss=0.03907, ctc_loss=0.03907, cr_loss=7.91e-15, over 944034.00 frames. 2024-09-23 22:51:57,010 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-23 22:52:24,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=363673.3333333333, ans=0.125 2024-09-23 22:52:37,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=363720.0, ans=0.125 2024-09-23 22:52:44,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=363720.0, ans=0.04949747468305833 2024-09-23 22:52:55,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=363766.6666666667, ans=0.1 2024-09-23 22:52:56,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=363766.6666666667, ans=0.125 2024-09-23 22:53:06,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=363813.3333333333, ans=0.125 2024-09-23 22:53:18,869 INFO [train.py:1198] (3/4) Epoch 21, batch 50, loss[loss=0.245, ctc_loss=0.1722, cr_loss=0.3638, over 11720.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1473, cr_loss=0.3641, over 738987.98 frames. ], batch size: 125, lr: 5.78e-03, grad_scale: 32.0 2024-09-23 22:53:21,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2024-09-23 22:53:25,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363860.0, ans=0.1 2024-09-23 22:53:33,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=363860.0, ans=0.125 2024-09-23 22:53:46,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=363906.6666666667, ans=6.0 2024-09-23 22:53:48,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=363906.6666666667, ans=0.125 2024-09-23 22:53:56,547 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.319e+02 1.470e+02 1.661e+02 2.685e+02, threshold=2.941e+02, percent-clipped=0.0 2024-09-23 22:54:41,603 INFO [train.py:1198] (3/4) Epoch 21, batch 100, loss[loss=0.203, ctc_loss=0.1333, cr_loss=0.3486, over 17145.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1449, cr_loss=0.3621, over 1327499.44 frames. ], batch size: 48, lr: 5.78e-03, grad_scale: 32.0 2024-09-23 22:54:59,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=364140.0, ans=0.125 2024-09-23 22:54:59,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=364140.0, ans=10.0 2024-09-23 22:55:59,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=364280.0, ans=0.0 2024-09-23 22:56:03,809 INFO [train.py:1198] (3/4) Epoch 21, batch 150, loss[loss=0.1934, ctc_loss=0.1286, cr_loss=0.3238, over 17170.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1437, cr_loss=0.36, over 1775539.99 frames. ], batch size: 45, lr: 5.78e-03, grad_scale: 32.0 2024-09-23 22:56:08,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=364326.6666666667, ans=0.0 2024-09-23 22:56:11,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=364326.6666666667, ans=0.125 2024-09-23 22:56:12,419 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.50 vs. limit=10.0 2024-09-23 22:56:13,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=364326.6666666667, ans=0.125 2024-09-23 22:56:21,664 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 22:56:38,962 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.242e+02 1.342e+02 1.442e+02 2.042e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-23 22:56:39,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364420.0, ans=0.1 2024-09-23 22:56:53,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=364466.6666666667, ans=0.125 2024-09-23 22:56:58,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=364466.6666666667, ans=0.05 2024-09-23 22:57:23,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=12.0 2024-09-23 22:57:29,600 INFO [train.py:1198] (3/4) Epoch 21, batch 200, loss[loss=0.2428, ctc_loss=0.1639, cr_loss=0.3945, over 11521.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1438, cr_loss=0.3598, over 2100244.38 frames. ], batch size: 125, lr: 5.78e-03, grad_scale: 32.0 2024-09-23 22:57:46,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=364606.6666666667, ans=0.1 2024-09-23 22:57:47,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=364606.6666666667, ans=0.125 2024-09-23 22:57:52,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=22.5 2024-09-23 22:58:05,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=364653.3333333333, ans=0.125 2024-09-23 22:58:38,445 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 22:58:52,295 INFO [train.py:1198] (3/4) Epoch 21, batch 250, loss[loss=0.2518, ctc_loss=0.1727, cr_loss=0.3956, over 14998.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1431, cr_loss=0.3593, over 2381723.04 frames. ], batch size: 89, lr: 5.77e-03, grad_scale: 32.0 2024-09-23 22:58:58,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=15.0 2024-09-23 22:59:10,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=364840.0, ans=0.2 2024-09-23 22:59:24,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=364886.6666666667, ans=0.125 2024-09-23 22:59:27,566 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.024e+02 1.297e+02 1.364e+02 1.577e+02 2.161e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-23 22:59:47,495 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 22:59:48,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=364933.3333333333, ans=0.025 2024-09-23 22:59:50,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=364933.3333333333, ans=0.125 2024-09-23 23:00:01,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=364980.0, ans=0.125 2024-09-23 23:00:15,863 INFO [train.py:1198] (3/4) Epoch 21, batch 300, loss[loss=0.2466, ctc_loss=0.1647, cr_loss=0.4096, over 16736.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.144, cr_loss=0.3604, over 2602146.01 frames. ], batch size: 61, lr: 5.77e-03, grad_scale: 32.0 2024-09-23 23:00:48,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=365120.0, ans=0.0 2024-09-23 23:00:51,474 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:01:15,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=365166.6666666667, ans=0.0 2024-09-23 23:01:18,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2024-09-23 23:01:32,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=12.0 2024-09-23 23:01:35,407 INFO [train.py:1198] (3/4) Epoch 21, batch 350, loss[loss=0.2057, ctc_loss=0.1362, cr_loss=0.3474, over 17236.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1427, cr_loss=0.3588, over 2774025.94 frames. ], batch size: 50, lr: 5.77e-03, grad_scale: 32.0 2024-09-23 23:01:52,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=365306.6666666667, ans=0.125 2024-09-23 23:02:15,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365353.3333333333, ans=0.1 2024-09-23 23:02:16,763 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.263e+02 1.348e+02 1.461e+02 2.184e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-23 23:02:30,170 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-09-23 23:02:31,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=365400.0, ans=0.125 2024-09-23 23:02:36,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=365400.0, ans=0.125 2024-09-23 23:02:37,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=365400.0, ans=0.125 2024-09-23 23:02:55,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=365446.6666666667, ans=0.025 2024-09-23 23:03:01,499 INFO [train.py:1198] (3/4) Epoch 21, batch 400, loss[loss=0.2183, ctc_loss=0.144, cr_loss=0.3713, over 17095.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1433, cr_loss=0.3597, over 2907769.33 frames. ], batch size: 43, lr: 5.77e-03, grad_scale: 32.0 2024-09-23 23:03:09,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=365493.3333333333, ans=0.125 2024-09-23 23:03:19,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.35 vs. limit=10.0 2024-09-23 23:03:23,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=365540.0, ans=0.2 2024-09-23 23:03:40,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=365586.6666666667, ans=0.1 2024-09-23 23:04:06,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=365680.0, ans=0.0 2024-09-23 23:04:19,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=365680.0, ans=0.125 2024-09-23 23:04:23,586 INFO [train.py:1198] (3/4) Epoch 21, batch 450, loss[loss=0.211, ctc_loss=0.141, cr_loss=0.3499, over 17302.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1434, cr_loss=0.3591, over 2991514.13 frames. ], batch size: 49, lr: 5.77e-03, grad_scale: 32.0 2024-09-23 23:04:49,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365773.3333333333, ans=0.1 2024-09-23 23:04:59,072 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.052e+02 1.246e+02 1.320e+02 1.437e+02 2.140e+02, threshold=2.640e+02, percent-clipped=0.0 2024-09-23 23:05:43,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=365913.3333333333, ans=0.07 2024-09-23 23:05:46,208 INFO [train.py:1198] (3/4) Epoch 21, batch 500, loss[loss=0.1951, ctc_loss=0.126, cr_loss=0.3455, over 17299.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1424, cr_loss=0.3581, over 3081737.01 frames. ], batch size: 46, lr: 5.76e-03, grad_scale: 32.0 2024-09-23 23:06:06,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=366006.6666666667, ans=0.0 2024-09-23 23:06:07,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=366006.6666666667, ans=0.1 2024-09-23 23:06:07,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=366006.6666666667, ans=0.0 2024-09-23 23:06:54,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=366146.6666666667, ans=0.025 2024-09-23 23:07:01,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=22.5 2024-09-23 23:07:03,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=12.0 2024-09-23 23:07:04,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=366146.6666666667, ans=0.0 2024-09-23 23:07:11,260 INFO [train.py:1198] (3/4) Epoch 21, batch 550, loss[loss=0.1786, ctc_loss=0.1184, cr_loss=0.3013, over 17095.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1425, cr_loss=0.3586, over 3153761.87 frames. ], batch size: 43, lr: 5.76e-03, grad_scale: 32.0 2024-09-23 23:07:20,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=366193.3333333333, ans=0.1 2024-09-23 23:07:38,884 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:07:45,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=366286.6666666667, ans=0.0 2024-09-23 23:07:46,484 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.227e+02 1.339e+02 1.480e+02 1.923e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-23 23:07:53,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=366286.6666666667, ans=0.025 2024-09-23 23:08:01,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=366333.3333333333, ans=0.125 2024-09-23 23:08:02,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=366333.3333333333, ans=0.1 2024-09-23 23:08:07,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=366333.3333333333, ans=0.2 2024-09-23 23:08:21,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=366380.0, ans=0.0 2024-09-23 23:08:22,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=366380.0, ans=0.0 2024-09-23 23:08:25,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=366380.0, ans=0.125 2024-09-23 23:08:33,668 INFO [train.py:1198] (3/4) Epoch 21, batch 600, loss[loss=0.2473, ctc_loss=0.1699, cr_loss=0.3873, over 14843.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1431, cr_loss=0.3596, over 3191041.57 frames. ], batch size: 89, lr: 5.76e-03, grad_scale: 32.0 2024-09-23 23:09:12,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=366520.0, ans=0.07 2024-09-23 23:09:20,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=366566.6666666667, ans=0.09899494936611666 2024-09-23 23:09:38,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.41 vs. limit=15.0 2024-09-23 23:09:52,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=366660.0, ans=0.0 2024-09-23 23:09:53,534 INFO [train.py:1198] (3/4) Epoch 21, batch 650, loss[loss=0.2165, ctc_loss=0.1438, cr_loss=0.3636, over 17224.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1422, cr_loss=0.3587, over 3239684.37 frames. ], batch size: 47, lr: 5.76e-03, grad_scale: 16.0 2024-09-23 23:10:20,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2024-09-23 23:10:32,743 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.251e+02 1.353e+02 1.422e+02 2.083e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-23 23:10:35,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2024-09-23 23:11:15,972 INFO [train.py:1198] (3/4) Epoch 21, batch 700, loss[loss=0.1886, ctc_loss=0.1231, cr_loss=0.3274, over 17265.00 frames. ], tot_loss[loss=0.2129, ctc_loss=0.1414, cr_loss=0.3574, over 3271575.25 frames. ], batch size: 42, lr: 5.76e-03, grad_scale: 16.0 2024-09-23 23:11:21,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=366893.3333333333, ans=0.125 2024-09-23 23:11:45,047 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:11:48,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=366940.0, ans=0.95 2024-09-23 23:12:27,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.70 vs. limit=10.0 2024-09-23 23:12:29,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=367080.0, ans=0.0 2024-09-23 23:12:36,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=367080.0, ans=0.0 2024-09-23 23:12:41,460 INFO [train.py:1198] (3/4) Epoch 21, batch 750, loss[loss=0.2062, ctc_loss=0.1378, cr_loss=0.3417, over 17026.00 frames. ], tot_loss[loss=0.2123, ctc_loss=0.141, cr_loss=0.3562, over 3279225.86 frames. ], batch size: 44, lr: 5.76e-03, grad_scale: 16.0 2024-09-23 23:13:10,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=367173.3333333333, ans=0.0 2024-09-23 23:13:13,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=367173.3333333333, ans=0.025 2024-09-23 23:13:21,100 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.308e+02 1.415e+02 1.587e+02 2.063e+02, threshold=2.831e+02, percent-clipped=0.0 2024-09-23 23:13:27,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=367220.0, ans=0.125 2024-09-23 23:13:37,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=367266.6666666667, ans=0.125 2024-09-23 23:14:03,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=367360.0, ans=0.0 2024-09-23 23:14:05,009 INFO [train.py:1198] (3/4) Epoch 21, batch 800, loss[loss=0.21, ctc_loss=0.1383, cr_loss=0.3586, over 17027.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1416, cr_loss=0.358, over 3303203.27 frames. ], batch size: 44, lr: 5.75e-03, grad_scale: 32.0 2024-09-23 23:14:05,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=367360.0, ans=0.0 2024-09-23 23:14:16,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=367360.0, ans=0.1 2024-09-23 23:14:19,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=367406.6666666667, ans=0.0 2024-09-23 23:14:19,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=367406.6666666667, ans=0.1 2024-09-23 23:14:37,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=367453.3333333333, ans=0.125 2024-09-23 23:14:53,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=367500.0, ans=0.0 2024-09-23 23:15:26,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=367593.3333333333, ans=0.125 2024-09-23 23:15:27,332 INFO [train.py:1198] (3/4) Epoch 21, batch 850, loss[loss=0.213, ctc_loss=0.1383, cr_loss=0.3737, over 15970.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1417, cr_loss=0.359, over 3314326.01 frames. ], batch size: 74, lr: 5.75e-03, grad_scale: 32.0 2024-09-23 23:16:05,379 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.026e+02 1.235e+02 1.359e+02 1.551e+02 2.196e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-23 23:16:38,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=367780.0, ans=0.2 2024-09-23 23:16:42,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=367780.0, ans=0.1 2024-09-23 23:16:53,129 INFO [train.py:1198] (3/4) Epoch 21, batch 900, loss[loss=0.2221, ctc_loss=0.1498, cr_loss=0.3612, over 17251.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1421, cr_loss=0.3591, over 3315217.38 frames. ], batch size: 44, lr: 5.75e-03, grad_scale: 16.0 2024-09-23 23:16:59,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=367826.6666666667, ans=0.125 2024-09-23 23:18:15,417 INFO [train.py:1198] (3/4) Epoch 21, batch 950, loss[loss=0.2036, ctc_loss=0.1297, cr_loss=0.3698, over 17296.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.142, cr_loss=0.3595, over 3331764.14 frames. ], batch size: 46, lr: 5.75e-03, grad_scale: 16.0 2024-09-23 23:18:19,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=368060.0, ans=0.125 2024-09-23 23:18:20,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2024-09-23 23:18:34,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=368106.6666666667, ans=0.125 2024-09-23 23:18:34,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=368106.6666666667, ans=0.0 2024-09-23 23:18:42,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=368106.6666666667, ans=0.125 2024-09-23 23:18:53,167 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.253e+02 1.356e+02 1.481e+02 2.989e+02, threshold=2.713e+02, percent-clipped=1.0 2024-09-23 23:18:56,675 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:18:56,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=368153.3333333333, ans=0.125 2024-09-23 23:18:58,281 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:19:06,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=368200.0, ans=0.1 2024-09-23 23:19:17,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=368246.6666666667, ans=0.0 2024-09-23 23:19:22,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2024-09-23 23:19:33,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=368293.3333333333, ans=0.125 2024-09-23 23:19:34,895 INFO [train.py:1198] (3/4) Epoch 21, batch 1000, loss[loss=0.2277, ctc_loss=0.151, cr_loss=0.3837, over 17091.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1428, cr_loss=0.3593, over 3324448.84 frames. ], batch size: 49, lr: 5.75e-03, grad_scale: 16.0 2024-09-23 23:19:48,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=368293.3333333333, ans=0.0 2024-09-23 23:19:54,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=368340.0, ans=0.0 2024-09-23 23:19:57,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=368340.0, ans=0.125 2024-09-23 23:19:58,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=368340.0, ans=0.125 2024-09-23 23:20:02,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2024-09-23 23:20:30,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=12.0 2024-09-23 23:20:50,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=368480.0, ans=0.2 2024-09-23 23:20:51,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=22.5 2024-09-23 23:20:58,193 INFO [train.py:1198] (3/4) Epoch 21, batch 1050, loss[loss=0.2254, ctc_loss=0.1517, cr_loss=0.3688, over 17032.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1437, cr_loss=0.3603, over 3321197.10 frames. ], batch size: 52, lr: 5.74e-03, grad_scale: 16.0 2024-09-23 23:21:05,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=368526.6666666667, ans=0.2 2024-09-23 23:21:08,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=368526.6666666667, ans=0.0 2024-09-23 23:21:11,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=368526.6666666667, ans=0.09899494936611666 2024-09-23 23:21:23,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=368573.3333333333, ans=0.0 2024-09-23 23:21:34,700 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.84 vs. limit=5.0 2024-09-23 23:21:39,786 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.259e+02 1.337e+02 1.472e+02 2.048e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-23 23:21:40,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2024-09-23 23:21:50,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=368666.6666666667, ans=0.0 2024-09-23 23:21:54,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=368666.6666666667, ans=0.0 2024-09-23 23:22:02,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=368666.6666666667, ans=0.125 2024-09-23 23:22:11,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2024-09-23 23:22:14,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=368713.3333333333, ans=0.125 2024-09-23 23:22:16,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=368713.3333333333, ans=0.0 2024-09-23 23:22:24,102 INFO [train.py:1198] (3/4) Epoch 21, batch 1100, loss[loss=0.2371, ctc_loss=0.158, cr_loss=0.3955, over 17013.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1431, cr_loss=0.3594, over 3322143.04 frames. ], batch size: 53, lr: 5.74e-03, grad_scale: 16.0 2024-09-23 23:22:57,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=368853.3333333333, ans=0.025 2024-09-23 23:22:58,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.10 vs. limit=22.5 2024-09-23 23:23:03,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=368853.3333333333, ans=0.0 2024-09-23 23:23:38,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=368946.6666666667, ans=0.0 2024-09-23 23:23:46,070 INFO [train.py:1198] (3/4) Epoch 21, batch 1150, loss[loss=0.1924, ctc_loss=0.1271, cr_loss=0.3265, over 17188.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1436, cr_loss=0.3597, over 3324933.67 frames. ], batch size: 41, lr: 5.74e-03, grad_scale: 16.0 2024-09-23 23:23:56,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=368993.3333333333, ans=0.125 2024-09-23 23:24:24,271 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.242e+02 1.332e+02 1.452e+02 2.606e+02, threshold=2.664e+02, percent-clipped=0.0 2024-09-23 23:24:32,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=369133.3333333333, ans=0.0 2024-09-23 23:24:37,967 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2024-09-23 23:24:54,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=369180.0, ans=0.125 2024-09-23 23:25:06,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.18 vs. limit=22.5 2024-09-23 23:25:08,571 INFO [train.py:1198] (3/4) Epoch 21, batch 1200, loss[loss=0.182, ctc_loss=0.1217, cr_loss=0.3019, over 17033.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.143, cr_loss=0.3582, over 3324360.15 frames. ], batch size: 39, lr: 5.74e-03, grad_scale: 32.0 2024-09-23 23:25:15,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=369226.6666666667, ans=0.125 2024-09-23 23:26:14,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=22.5 2024-09-23 23:26:22,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=369413.3333333333, ans=0.0 2024-09-23 23:26:30,658 INFO [train.py:1198] (3/4) Epoch 21, batch 1250, loss[loss=0.2072, ctc_loss=0.1399, cr_loss=0.3364, over 16968.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1434, cr_loss=0.3585, over 3315347.98 frames. ], batch size: 42, lr: 5.74e-03, grad_scale: 16.0 2024-09-23 23:27:09,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=369553.3333333333, ans=0.125 2024-09-23 23:27:13,543 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 9.996e+01 1.239e+02 1.348e+02 1.443e+02 2.209e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-23 23:27:56,570 INFO [train.py:1198] (3/4) Epoch 21, batch 1300, loss[loss=0.1975, ctc_loss=0.1315, cr_loss=0.3301, over 17355.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.143, cr_loss=0.3577, over 3325159.98 frames. ], batch size: 48, lr: 5.74e-03, grad_scale: 16.0 2024-09-23 23:27:57,283 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2024-09-23 23:28:21,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=369740.0, ans=0.0 2024-09-23 23:28:32,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-23 23:28:33,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=369786.6666666667, ans=0.125 2024-09-23 23:28:35,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=369786.6666666667, ans=0.125 2024-09-23 23:28:43,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369833.3333333333, ans=0.1 2024-09-23 23:29:16,753 INFO [train.py:1198] (3/4) Epoch 21, batch 1350, loss[loss=0.193, ctc_loss=0.1246, cr_loss=0.3419, over 17251.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.144, cr_loss=0.3599, over 3327350.44 frames. ], batch size: 44, lr: 5.73e-03, grad_scale: 16.0 2024-09-23 23:29:20,204 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:29:35,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.04 vs. limit=15.0 2024-09-23 23:29:58,706 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.263e+02 1.337e+02 1.513e+02 2.330e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-23 23:30:38,708 INFO [train.py:1198] (3/4) Epoch 21, batch 1400, loss[loss=0.2108, ctc_loss=0.1424, cr_loss=0.3419, over 17010.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1431, cr_loss=0.359, over 3340118.94 frames. ], batch size: 51, lr: 5.73e-03, grad_scale: 16.0 2024-09-23 23:31:02,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=370206.6666666667, ans=0.1 2024-09-23 23:31:13,140 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2024-09-23 23:32:03,683 INFO [train.py:1198] (3/4) Epoch 21, batch 1450, loss[loss=0.2062, ctc_loss=0.1398, cr_loss=0.3321, over 17284.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.142, cr_loss=0.357, over 3356526.99 frames. ], batch size: 49, lr: 5.73e-03, grad_scale: 16.0 2024-09-23 23:32:07,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=370393.3333333333, ans=0.04949747468305833 2024-09-23 23:32:12,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=370393.3333333333, ans=0.0 2024-09-23 23:32:26,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.61 vs. limit=10.0 2024-09-23 23:32:39,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=370486.6666666667, ans=0.125 2024-09-23 23:32:46,581 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.255e+02 1.316e+02 1.424e+02 2.117e+02, threshold=2.631e+02, percent-clipped=0.0 2024-09-23 23:32:46,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=370486.6666666667, ans=0.125 2024-09-23 23:33:26,658 INFO [train.py:1198] (3/4) Epoch 21, batch 1500, loss[loss=0.255, ctc_loss=0.1741, cr_loss=0.4042, over 16600.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1415, cr_loss=0.3558, over 3359038.40 frames. ], batch size: 66, lr: 5.73e-03, grad_scale: 16.0 2024-09-23 23:33:59,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=370720.0, ans=0.125 2024-09-23 23:34:02,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.47 vs. limit=15.0 2024-09-23 23:34:18,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=370766.6666666667, ans=0.125 2024-09-23 23:34:24,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=370766.6666666667, ans=0.125 2024-09-23 23:34:29,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=370813.3333333333, ans=12.0 2024-09-23 23:34:43,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=370813.3333333333, ans=0.125 2024-09-23 23:34:43,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.58 vs. limit=12.0 2024-09-23 23:34:47,035 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=12.0 2024-09-23 23:34:48,414 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2024-09-23 23:34:49,373 INFO [train.py:1198] (3/4) Epoch 21, batch 1550, loss[loss=0.2034, ctc_loss=0.1346, cr_loss=0.3437, over 17075.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1417, cr_loss=0.3565, over 3370176.29 frames. ], batch size: 46, lr: 5.73e-03, grad_scale: 16.0 2024-09-23 23:34:51,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=370860.0, ans=0.2 2024-09-23 23:34:51,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=370860.0, ans=0.0 2024-09-23 23:35:10,432 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:35:24,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=370953.3333333333, ans=0.125 2024-09-23 23:35:29,478 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.240e+02 1.330e+02 1.451e+02 2.029e+02, threshold=2.659e+02, percent-clipped=0.0 2024-09-23 23:35:42,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=371000.0, ans=0.125 2024-09-23 23:35:43,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=371000.0, ans=22.5 2024-09-23 23:35:50,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=371000.0, ans=0.035 2024-09-23 23:36:12,040 INFO [train.py:1198] (3/4) Epoch 21, batch 1600, loss[loss=0.2094, ctc_loss=0.1363, cr_loss=0.3655, over 17168.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1416, cr_loss=0.3574, over 3368162.80 frames. ], batch size: 45, lr: 5.73e-03, grad_scale: 32.0 2024-09-23 23:36:26,910 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2024-09-23 23:36:32,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=371140.0, ans=0.125 2024-09-23 23:36:42,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=371186.6666666667, ans=0.125 2024-09-23 23:37:14,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=22.5 2024-09-23 23:37:37,165 INFO [train.py:1198] (3/4) Epoch 21, batch 1650, loss[loss=0.2112, ctc_loss=0.1358, cr_loss=0.3768, over 17305.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1426, cr_loss=0.3593, over 3369930.47 frames. ], batch size: 46, lr: 5.72e-03, grad_scale: 32.0 2024-09-23 23:38:08,352 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=12.0 2024-09-23 23:38:09,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=371420.0, ans=0.05 2024-09-23 23:38:16,965 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.286e+02 1.382e+02 1.548e+02 2.796e+02, threshold=2.764e+02, percent-clipped=1.0 2024-09-23 23:38:38,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-09-23 23:38:41,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=371513.3333333333, ans=0.2 2024-09-23 23:38:53,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=371513.3333333333, ans=0.2 2024-09-23 23:38:56,660 INFO [train.py:1198] (3/4) Epoch 21, batch 1700, loss[loss=0.2098, ctc_loss=0.1387, cr_loss=0.3559, over 17313.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1417, cr_loss=0.3579, over 3376342.48 frames. ], batch size: 46, lr: 5.72e-03, grad_scale: 32.0 2024-09-23 23:38:58,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=371560.0, ans=0.0 2024-09-23 23:39:01,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=371560.0, ans=0.0 2024-09-23 23:39:09,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=371560.0, ans=0.2 2024-09-23 23:39:16,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2024-09-23 23:39:55,009 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:40:13,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=371746.6666666667, ans=0.2 2024-09-23 23:40:15,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=371746.6666666667, ans=0.0 2024-09-23 23:40:17,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=22.5 2024-09-23 23:40:18,413 INFO [train.py:1198] (3/4) Epoch 21, batch 1750, loss[loss=0.2339, ctc_loss=0.1547, cr_loss=0.3961, over 16940.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1419, cr_loss=0.3577, over 3372294.73 frames. ], batch size: 58, lr: 5.72e-03, grad_scale: 32.0 2024-09-23 23:40:21,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=371793.3333333333, ans=0.125 2024-09-23 23:40:58,127 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.218e+02 1.302e+02 1.379e+02 1.745e+02, threshold=2.603e+02, percent-clipped=0.0 2024-09-23 23:41:23,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=371980.0, ans=0.025 2024-09-23 23:41:40,632 INFO [train.py:1198] (3/4) Epoch 21, batch 1800, loss[loss=0.2627, ctc_loss=0.1847, cr_loss=0.3899, over 12107.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1421, cr_loss=0.3582, over 3365077.93 frames. ], batch size: 123, lr: 5.72e-03, grad_scale: 32.0 2024-09-23 23:41:49,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=372026.6666666667, ans=0.0 2024-09-23 23:41:56,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=372026.6666666667, ans=0.1 2024-09-23 23:42:10,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=372073.3333333333, ans=0.0 2024-09-23 23:42:23,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=372120.0, ans=0.125 2024-09-23 23:43:01,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=372213.3333333333, ans=0.125 2024-09-23 23:43:05,769 INFO [train.py:1198] (3/4) Epoch 21, batch 1850, loss[loss=0.2014, ctc_loss=0.1302, cr_loss=0.3561, over 16974.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1433, cr_loss=0.359, over 3360374.08 frames. ], batch size: 42, lr: 5.72e-03, grad_scale: 32.0 2024-09-23 23:43:41,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.64 vs. limit=15.0 2024-09-23 23:43:41,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.00 vs. limit=10.0 2024-09-23 23:43:45,461 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.299e+02 1.413e+02 1.617e+02 3.494e+02, threshold=2.826e+02, percent-clipped=2.0 2024-09-23 23:43:52,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=372400.0, ans=0.125 2024-09-23 23:43:53,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=372400.0, ans=0.125 2024-09-23 23:44:19,971 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2024-09-23 23:44:28,065 INFO [train.py:1198] (3/4) Epoch 21, batch 1900, loss[loss=0.216, ctc_loss=0.1432, cr_loss=0.3641, over 17158.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1427, cr_loss=0.3582, over 3356721.56 frames. ], batch size: 48, lr: 5.71e-03, grad_scale: 32.0 2024-09-23 23:44:58,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=372586.6666666667, ans=0.035 2024-09-23 23:45:20,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=372633.3333333333, ans=0.125 2024-09-23 23:45:21,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-09-23 23:45:38,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=372680.0, ans=0.125 2024-09-23 23:45:47,539 INFO [train.py:1198] (3/4) Epoch 21, batch 1950, loss[loss=0.1922, ctc_loss=0.1243, cr_loss=0.3391, over 17032.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1422, cr_loss=0.3576, over 3356916.59 frames. ], batch size: 44, lr: 5.71e-03, grad_scale: 32.0 2024-09-23 23:45:47,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=372726.6666666667, ans=0.125 2024-09-23 23:45:52,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=372726.6666666667, ans=0.125 2024-09-23 23:46:16,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=372773.3333333333, ans=0.1 2024-09-23 23:46:26,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372820.0, ans=0.1 2024-09-23 23:46:28,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=372820.0, ans=0.125 2024-09-23 23:46:29,779 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.272e+02 1.378e+02 1.476e+02 2.541e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-23 23:46:38,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=372866.6666666667, ans=0.125 2024-09-23 23:46:45,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=372866.6666666667, ans=0.125 2024-09-23 23:47:09,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2024-09-23 23:47:12,288 INFO [train.py:1198] (3/4) Epoch 21, batch 2000, loss[loss=0.2176, ctc_loss=0.1476, cr_loss=0.35, over 16851.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.142, cr_loss=0.3575, over 3363384.72 frames. ], batch size: 58, lr: 5.71e-03, grad_scale: 32.0 2024-09-23 23:47:31,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=373006.6666666667, ans=0.5 2024-09-23 23:47:40,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=373006.6666666667, ans=0.125 2024-09-23 23:47:50,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=373053.3333333333, ans=0.125 2024-09-23 23:48:25,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=373146.6666666667, ans=0.0 2024-09-23 23:48:34,486 INFO [train.py:1198] (3/4) Epoch 21, batch 2050, loss[loss=0.1771, ctc_loss=0.1146, cr_loss=0.3123, over 17058.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1417, cr_loss=0.358, over 3365074.08 frames. ], batch size: 40, lr: 5.71e-03, grad_scale: 32.0 2024-09-23 23:48:34,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=373193.3333333333, ans=0.0 2024-09-23 23:48:45,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=373193.3333333333, ans=0.125 2024-09-23 23:49:14,304 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.256e+02 1.387e+02 1.531e+02 2.398e+02, threshold=2.775e+02, percent-clipped=0.0 2024-09-23 23:49:27,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=373333.3333333333, ans=0.2 2024-09-23 23:49:40,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=373333.3333333333, ans=0.2 2024-09-23 23:49:48,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=373380.0, ans=0.0 2024-09-23 23:49:59,555 INFO [train.py:1198] (3/4) Epoch 21, batch 2100, loss[loss=0.2575, ctc_loss=0.1788, cr_loss=0.3936, over 11834.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1425, cr_loss=0.3597, over 3353037.94 frames. ], batch size: 123, lr: 5.71e-03, grad_scale: 32.0 2024-09-23 23:50:04,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=373426.6666666667, ans=0.125 2024-09-23 23:50:09,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=373426.6666666667, ans=0.2 2024-09-23 23:50:14,369 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:50:31,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=373520.0, ans=0.09899494936611666 2024-09-23 23:50:35,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=373520.0, ans=0.125 2024-09-23 23:51:22,284 INFO [train.py:1198] (3/4) Epoch 21, batch 2150, loss[loss=0.1913, ctc_loss=0.1252, cr_loss=0.3307, over 17030.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1429, cr_loss=0.3596, over 3351524.76 frames. ], batch size: 39, lr: 5.71e-03, grad_scale: 32.0 2024-09-23 23:51:45,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=373706.6666666667, ans=0.125 2024-09-23 23:51:52,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=373706.6666666667, ans=0.0 2024-09-23 23:52:04,669 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.257e+02 1.382e+02 1.555e+02 2.014e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-23 23:52:19,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=373800.0, ans=0.2 2024-09-23 23:52:42,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=373846.6666666667, ans=0.0 2024-09-23 23:52:47,118 INFO [train.py:1198] (3/4) Epoch 21, batch 2200, loss[loss=0.2193, ctc_loss=0.1494, cr_loss=0.3495, over 17146.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1422, cr_loss=0.3588, over 3350393.77 frames. ], batch size: 48, lr: 5.70e-03, grad_scale: 32.0 2024-09-23 23:52:55,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=373893.3333333333, ans=0.125 2024-09-23 23:52:57,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2024-09-23 23:53:01,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=373940.0, ans=0.0 2024-09-23 23:53:34,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=374033.3333333333, ans=0.125 2024-09-23 23:53:46,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=374033.3333333333, ans=0.2 2024-09-23 23:54:02,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=374080.0, ans=0.125 2024-09-23 23:54:06,787 INFO [train.py:1198] (3/4) Epoch 21, batch 2250, loss[loss=0.1958, ctc_loss=0.1295, cr_loss=0.3316, over 17297.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1417, cr_loss=0.3584, over 3363001.14 frames. ], batch size: 46, lr: 5.70e-03, grad_scale: 32.0 2024-09-23 23:54:49,165 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.304e+02 1.416e+02 1.549e+02 2.189e+02, threshold=2.832e+02, percent-clipped=0.0 2024-09-23 23:55:29,456 INFO [train.py:1198] (3/4) Epoch 21, batch 2300, loss[loss=0.229, ctc_loss=0.1532, cr_loss=0.3788, over 17126.00 frames. ], tot_loss[loss=0.2129, ctc_loss=0.1414, cr_loss=0.3577, over 3352424.17 frames. ], batch size: 48, lr: 5.70e-03, grad_scale: 32.0 2024-09-23 23:55:39,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=374360.0, ans=0.125 2024-09-23 23:55:47,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=374406.6666666667, ans=0.125 2024-09-23 23:55:47,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=374406.6666666667, ans=0.0 2024-09-23 23:55:53,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=374406.6666666667, ans=0.1 2024-09-23 23:56:03,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=374453.3333333333, ans=0.2 2024-09-23 23:56:08,584 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.85 vs. limit=15.0 2024-09-23 23:56:14,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=374453.3333333333, ans=0.1 2024-09-23 23:56:15,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=374453.3333333333, ans=0.2 2024-09-23 23:56:54,562 INFO [train.py:1198] (3/4) Epoch 21, batch 2350, loss[loss=0.2176, ctc_loss=0.1427, cr_loss=0.3746, over 17207.00 frames. ], tot_loss[loss=0.2125, ctc_loss=0.1411, cr_loss=0.3569, over 3361105.28 frames. ], batch size: 50, lr: 5.70e-03, grad_scale: 32.0 2024-09-23 23:57:38,982 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.011e+02 1.274e+02 1.389e+02 1.537e+02 2.355e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-23 23:57:42,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=374686.6666666667, ans=0.2 2024-09-23 23:57:48,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=374733.3333333333, ans=0.125 2024-09-23 23:57:58,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=374733.3333333333, ans=0.0 2024-09-23 23:58:17,353 INFO [train.py:1198] (3/4) Epoch 21, batch 2400, loss[loss=0.2153, ctc_loss=0.1431, cr_loss=0.3608, over 17145.00 frames. ], tot_loss[loss=0.2108, ctc_loss=0.1399, cr_loss=0.3548, over 3370244.10 frames. ], batch size: 48, lr: 5.70e-03, grad_scale: 32.0 2024-09-23 23:58:32,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=12.0 2024-09-23 23:58:47,985 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:59:08,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=374966.6666666667, ans=0.125 2024-09-23 23:59:08,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=374966.6666666667, ans=0.2 2024-09-23 23:59:39,563 INFO [train.py:1198] (3/4) Epoch 21, batch 2450, loss[loss=0.2009, ctc_loss=0.1318, cr_loss=0.3457, over 17013.00 frames. ], tot_loss[loss=0.2118, ctc_loss=0.1406, cr_loss=0.3562, over 3376243.35 frames. ], batch size: 44, lr: 5.70e-03, grad_scale: 32.0 2024-09-23 23:59:53,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-24 00:00:20,974 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.266e+02 1.364e+02 1.496e+02 2.065e+02, threshold=2.727e+02, percent-clipped=0.0 2024-09-24 00:00:27,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=375200.0, ans=0.025 2024-09-24 00:00:31,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=375200.0, ans=0.125 2024-09-24 00:01:02,065 INFO [train.py:1198] (3/4) Epoch 21, batch 2500, loss[loss=0.2218, ctc_loss=0.1459, cr_loss=0.3792, over 17243.00 frames. ], tot_loss[loss=0.2125, ctc_loss=0.1411, cr_loss=0.3568, over 3374295.40 frames. ], batch size: 50, lr: 5.69e-03, grad_scale: 32.0 2024-09-24 00:01:10,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=375293.3333333333, ans=0.0 2024-09-24 00:01:24,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=375340.0, ans=0.1 2024-09-24 00:01:39,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375386.6666666667, ans=0.1 2024-09-24 00:01:53,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=375433.3333333333, ans=0.2 2024-09-24 00:01:55,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375433.3333333333, ans=0.1 2024-09-24 00:02:02,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=375433.3333333333, ans=0.125 2024-09-24 00:02:26,981 INFO [train.py:1198] (3/4) Epoch 21, batch 2550, loss[loss=0.2265, ctc_loss=0.1511, cr_loss=0.3769, over 17039.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1412, cr_loss=0.3568, over 3364238.45 frames. ], batch size: 52, lr: 5.69e-03, grad_scale: 32.0 2024-09-24 00:02:28,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=375526.6666666667, ans=0.1 2024-09-24 00:02:44,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=375573.3333333333, ans=0.2 2024-09-24 00:03:08,320 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.019e+02 1.246e+02 1.350e+02 1.498e+02 2.836e+02, threshold=2.701e+02, percent-clipped=1.0 2024-09-24 00:03:08,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=375620.0, ans=0.125 2024-09-24 00:03:26,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2024-09-24 00:03:46,040 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-09-24 00:03:46,562 INFO [train.py:1198] (3/4) Epoch 21, batch 2600, loss[loss=0.2032, ctc_loss=0.137, cr_loss=0.331, over 17115.00 frames. ], tot_loss[loss=0.2113, ctc_loss=0.1402, cr_loss=0.3553, over 3367787.45 frames. ], batch size: 49, lr: 5.69e-03, grad_scale: 32.0 2024-09-24 00:03:46,890 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 00:03:58,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=375760.0, ans=0.125 2024-09-24 00:04:06,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=375806.6666666667, ans=0.0 2024-09-24 00:04:40,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=375900.0, ans=0.125 2024-09-24 00:04:46,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=375900.0, ans=0.0 2024-09-24 00:04:59,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=375946.6666666667, ans=0.125 2024-09-24 00:05:07,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=375993.3333333333, ans=0.0 2024-09-24 00:05:08,531 INFO [train.py:1198] (3/4) Epoch 21, batch 2650, loss[loss=0.2317, ctc_loss=0.1545, cr_loss=0.3862, over 16752.00 frames. ], tot_loss[loss=0.2125, ctc_loss=0.141, cr_loss=0.3572, over 3364181.83 frames. ], batch size: 61, lr: 5.69e-03, grad_scale: 32.0 2024-09-24 00:05:18,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375993.3333333333, ans=0.1 2024-09-24 00:05:41,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2024-09-24 00:05:52,572 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.243e+02 1.351e+02 1.464e+02 2.576e+02, threshold=2.701e+02, percent-clipped=0.0 2024-09-24 00:06:08,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=376133.3333333333, ans=0.125 2024-09-24 00:06:23,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=15.0 2024-09-24 00:06:33,544 INFO [train.py:1198] (3/4) Epoch 21, batch 2700, loss[loss=0.1748, ctc_loss=0.1129, cr_loss=0.3094, over 17200.00 frames. ], tot_loss[loss=0.2123, ctc_loss=0.1409, cr_loss=0.3568, over 3359551.49 frames. ], batch size: 41, lr: 5.69e-03, grad_scale: 32.0 2024-09-24 00:06:35,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=376226.6666666667, ans=0.125 2024-09-24 00:06:46,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2024-09-24 00:06:51,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=376273.3333333333, ans=0.1 2024-09-24 00:07:02,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=376273.3333333333, ans=0.035 2024-09-24 00:07:08,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=376320.0, ans=0.0 2024-09-24 00:07:22,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=376366.6666666667, ans=0.025 2024-09-24 00:07:23,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=376366.6666666667, ans=0.0 2024-09-24 00:07:30,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=376366.6666666667, ans=0.025 2024-09-24 00:07:43,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2024-09-24 00:07:55,725 INFO [train.py:1198] (3/4) Epoch 21, batch 2750, loss[loss=0.2162, ctc_loss=0.1417, cr_loss=0.3726, over 17299.00 frames. ], tot_loss[loss=0.2125, ctc_loss=0.141, cr_loss=0.3574, over 3371443.61 frames. ], batch size: 51, lr: 5.68e-03, grad_scale: 32.0 2024-09-24 00:08:31,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=376553.3333333333, ans=0.025 2024-09-24 00:08:37,709 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.226e+02 1.342e+02 1.468e+02 2.196e+02, threshold=2.683e+02, percent-clipped=0.0 2024-09-24 00:09:04,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=376646.6666666667, ans=0.1 2024-09-24 00:09:15,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=376646.6666666667, ans=0.1 2024-09-24 00:09:18,228 INFO [train.py:1198] (3/4) Epoch 21, batch 2800, loss[loss=0.2209, ctc_loss=0.1463, cr_loss=0.373, over 17361.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1415, cr_loss=0.3585, over 3369856.33 frames. ], batch size: 48, lr: 5.68e-03, grad_scale: 32.0 2024-09-24 00:09:23,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=376693.3333333333, ans=0.125 2024-09-24 00:09:39,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-09-24 00:09:53,553 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 00:10:38,126 INFO [train.py:1198] (3/4) Epoch 21, batch 2850, loss[loss=0.2178, ctc_loss=0.1451, cr_loss=0.3636, over 17207.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1423, cr_loss=0.3599, over 3352758.23 frames. ], batch size: 47, lr: 5.68e-03, grad_scale: 32.0 2024-09-24 00:10:41,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=376926.6666666667, ans=0.125 2024-09-24 00:10:52,465 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=15.0 2024-09-24 00:11:00,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.66 vs. limit=22.5 2024-09-24 00:11:22,310 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.291e+02 1.370e+02 1.483e+02 1.856e+02, threshold=2.740e+02, percent-clipped=0.0 2024-09-24 00:11:41,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=377066.6666666667, ans=0.1 2024-09-24 00:11:47,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=377113.3333333333, ans=0.09899494936611666 2024-09-24 00:11:51,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=377113.3333333333, ans=0.09899494936611666 2024-09-24 00:12:00,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=377113.3333333333, ans=0.125 2024-09-24 00:12:03,412 INFO [train.py:1198] (3/4) Epoch 21, batch 2900, loss[loss=0.1838, ctc_loss=0.1206, cr_loss=0.3158, over 17045.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1421, cr_loss=0.3586, over 3342445.51 frames. ], batch size: 39, lr: 5.68e-03, grad_scale: 32.0 2024-09-24 00:12:26,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.00 vs. limit=15.0 2024-09-24 00:13:07,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=377300.0, ans=0.125 2024-09-24 00:13:19,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=377346.6666666667, ans=0.125 2024-09-24 00:13:21,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=377346.6666666667, ans=0.125 2024-09-24 00:13:26,188 INFO [train.py:1198] (3/4) Epoch 21, batch 2950, loss[loss=0.2252, ctc_loss=0.1516, cr_loss=0.3676, over 17106.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.142, cr_loss=0.3588, over 3347420.95 frames. ], batch size: 49, lr: 5.68e-03, grad_scale: 16.0 2024-09-24 00:13:29,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=377393.3333333333, ans=0.125 2024-09-24 00:13:31,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2024-09-24 00:14:11,337 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.234e+02 1.337e+02 1.460e+02 2.034e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-24 00:14:16,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=377533.3333333333, ans=0.125 2024-09-24 00:14:18,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=22.5 2024-09-24 00:14:23,026 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2024-09-24 00:14:40,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=377580.0, ans=0.0 2024-09-24 00:14:47,426 INFO [train.py:1198] (3/4) Epoch 21, batch 3000, loss[loss=0.2044, ctc_loss=0.1358, cr_loss=0.3428, over 17018.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1428, cr_loss=0.3602, over 3349057.31 frames. ], batch size: 44, lr: 5.68e-03, grad_scale: 16.0 2024-09-24 00:14:47,426 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 00:15:02,880 INFO [train.py:1230] (3/4) Epoch 21, validation: loss=0.03893, ctc_loss=0.03893, cr_loss=7.803e-15, over 944034.00 frames. 2024-09-24 00:15:02,880 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 00:15:22,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=377673.3333333333, ans=0.1 2024-09-24 00:15:31,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=377673.3333333333, ans=0.125 2024-09-24 00:15:39,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=377720.0, ans=0.0 2024-09-24 00:15:53,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=377766.6666666667, ans=0.07 2024-09-24 00:16:21,777 INFO [train.py:1198] (3/4) Epoch 21, batch 3050, loss[loss=0.2082, ctc_loss=0.1394, cr_loss=0.3442, over 17360.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.142, cr_loss=0.3588, over 3345291.27 frames. ], batch size: 48, lr: 5.67e-03, grad_scale: 16.0 2024-09-24 00:16:33,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-09-24 00:16:49,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=377906.6666666667, ans=0.125 2024-09-24 00:17:04,477 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.246e+02 1.316e+02 1.406e+02 3.085e+02, threshold=2.632e+02, percent-clipped=1.0 2024-09-24 00:17:40,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=378046.6666666667, ans=0.125 2024-09-24 00:17:43,204 INFO [train.py:1198] (3/4) Epoch 21, batch 3100, loss[loss=0.1705, ctc_loss=0.1126, cr_loss=0.2897, over 15810.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1426, cr_loss=0.3586, over 3340263.20 frames. ], batch size: 35, lr: 5.67e-03, grad_scale: 16.0 2024-09-24 00:17:45,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=378093.3333333333, ans=0.07 2024-09-24 00:17:51,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=378093.3333333333, ans=0.035 2024-09-24 00:18:05,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=378140.0, ans=0.04949747468305833 2024-09-24 00:18:07,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=378140.0, ans=0.125 2024-09-24 00:18:19,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=378186.6666666667, ans=0.0 2024-09-24 00:18:41,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378233.3333333333, ans=0.1 2024-09-24 00:18:42,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2024-09-24 00:18:46,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=378280.0, ans=0.0 2024-09-24 00:18:51,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=378280.0, ans=0.125 2024-09-24 00:18:55,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.60 vs. limit=10.0 2024-09-24 00:18:57,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=378280.0, ans=0.025 2024-09-24 00:19:04,081 INFO [train.py:1198] (3/4) Epoch 21, batch 3150, loss[loss=0.2178, ctc_loss=0.1473, cr_loss=0.3525, over 17042.00 frames. ], tot_loss[loss=0.2124, ctc_loss=0.1411, cr_loss=0.3566, over 3356914.36 frames. ], batch size: 56, lr: 5.67e-03, grad_scale: 16.0 2024-09-24 00:19:19,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378373.3333333333, ans=0.1 2024-09-24 00:19:26,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=378373.3333333333, ans=0.0 2024-09-24 00:19:35,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=378420.0, ans=0.125 2024-09-24 00:19:46,189 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 9.987e+01 1.255e+02 1.337e+02 1.461e+02 2.046e+02, threshold=2.675e+02, percent-clipped=0.0 2024-09-24 00:19:48,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=378420.0, ans=0.125 2024-09-24 00:20:21,951 INFO [train.py:1198] (3/4) Epoch 21, batch 3200, loss[loss=0.203, ctc_loss=0.1329, cr_loss=0.3504, over 17180.00 frames. ], tot_loss[loss=0.2111, ctc_loss=0.14, cr_loss=0.3554, over 3366417.74 frames. ], batch size: 41, lr: 5.67e-03, grad_scale: 32.0 2024-09-24 00:20:46,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=378606.6666666667, ans=0.0 2024-09-24 00:20:46,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=378606.6666666667, ans=0.0 2024-09-24 00:21:00,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=378653.3333333333, ans=0.125 2024-09-24 00:21:03,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=378653.3333333333, ans=0.125 2024-09-24 00:21:15,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378700.0, ans=0.1 2024-09-24 00:21:19,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=15.0 2024-09-24 00:21:37,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=378746.6666666667, ans=0.125 2024-09-24 00:21:42,248 INFO [train.py:1198] (3/4) Epoch 21, batch 3250, loss[loss=0.1699, ctc_loss=0.1097, cr_loss=0.3008, over 16272.00 frames. ], tot_loss[loss=0.2103, ctc_loss=0.1394, cr_loss=0.3546, over 3364922.87 frames. ], batch size: 36, lr: 5.67e-03, grad_scale: 32.0 2024-09-24 00:21:58,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=378840.0, ans=0.1 2024-09-24 00:21:59,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=378840.0, ans=0.125 2024-09-24 00:22:24,295 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.280e+02 1.348e+02 1.474e+02 1.911e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-24 00:22:24,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=378886.6666666667, ans=0.025 2024-09-24 00:22:27,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=378933.3333333333, ans=0.2 2024-09-24 00:22:31,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=378933.3333333333, ans=0.1 2024-09-24 00:22:40,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=378933.3333333333, ans=0.0 2024-09-24 00:23:00,170 INFO [train.py:1198] (3/4) Epoch 21, batch 3300, loss[loss=0.1765, ctc_loss=0.116, cr_loss=0.3026, over 17094.00 frames. ], tot_loss[loss=0.2109, ctc_loss=0.1398, cr_loss=0.3552, over 3366808.94 frames. ], batch size: 40, lr: 5.67e-03, grad_scale: 32.0 2024-09-24 00:23:05,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=379026.6666666667, ans=0.125 2024-09-24 00:23:16,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=12.0 2024-09-24 00:23:30,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=379120.0, ans=0.0 2024-09-24 00:23:42,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=379120.0, ans=0.0 2024-09-24 00:23:47,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=379166.6666666667, ans=0.125 2024-09-24 00:23:51,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.13 vs. limit=10.0 2024-09-24 00:23:57,303 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.51 vs. limit=22.5 2024-09-24 00:24:04,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=379213.3333333333, ans=0.125 2024-09-24 00:24:18,315 INFO [train.py:1198] (3/4) Epoch 21, batch 3350, loss[loss=0.2043, ctc_loss=0.1337, cr_loss=0.3531, over 17279.00 frames. ], tot_loss[loss=0.2117, ctc_loss=0.1404, cr_loss=0.3564, over 3361949.67 frames. ], batch size: 46, lr: 5.66e-03, grad_scale: 32.0 2024-09-24 00:24:27,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=379260.0, ans=0.125 2024-09-24 00:24:47,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=379306.6666666667, ans=0.0 2024-09-24 00:24:56,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=379353.3333333333, ans=0.2 2024-09-24 00:25:02,489 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.292e+02 1.373e+02 1.509e+02 2.435e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-24 00:25:31,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.02 vs. limit=10.0 2024-09-24 00:25:38,759 INFO [train.py:1198] (3/4) Epoch 21, batch 3400, loss[loss=0.2328, ctc_loss=0.1553, cr_loss=0.3874, over 17142.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1418, cr_loss=0.3584, over 3355427.86 frames. ], batch size: 48, lr: 5.66e-03, grad_scale: 32.0 2024-09-24 00:25:54,828 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 00:26:03,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=379540.0, ans=0.0 2024-09-24 00:26:07,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2024-09-24 00:26:09,435 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.11 vs. limit=15.0 2024-09-24 00:26:32,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=379633.3333333333, ans=0.125 2024-09-24 00:26:36,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=379633.3333333333, ans=0.07 2024-09-24 00:26:44,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=379680.0, ans=0.0 2024-09-24 00:26:56,714 INFO [train.py:1198] (3/4) Epoch 21, batch 3450, loss[loss=0.2041, ctc_loss=0.1399, cr_loss=0.3208, over 17301.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.1416, cr_loss=0.3575, over 3357997.29 frames. ], batch size: 49, lr: 5.66e-03, grad_scale: 32.0 2024-09-24 00:27:03,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=379726.6666666667, ans=0.1 2024-09-24 00:27:11,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=379773.3333333333, ans=0.0 2024-09-24 00:27:25,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=15.0 2024-09-24 00:27:38,510 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.292e+02 1.385e+02 1.500e+02 2.362e+02, threshold=2.770e+02, percent-clipped=0.0 2024-09-24 00:27:38,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=379820.0, ans=0.0 2024-09-24 00:28:08,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=379913.3333333333, ans=0.1 2024-09-24 00:28:16,470 INFO [train.py:1198] (3/4) Epoch 21, batch 3500, loss[loss=0.2057, ctc_loss=0.1358, cr_loss=0.3496, over 17053.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.143, cr_loss=0.3603, over 3355264.82 frames. ], batch size: 39, lr: 5.66e-03, grad_scale: 32.0 2024-09-24 00:28:23,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=379960.0, ans=0.125 2024-09-24 00:28:26,451 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=22.5 2024-09-24 00:28:53,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=380053.3333333333, ans=0.125 2024-09-24 00:28:53,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=380053.3333333333, ans=0.0 2024-09-24 00:29:30,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=380146.6666666667, ans=0.125 2024-09-24 00:29:36,659 INFO [train.py:1198] (3/4) Epoch 21, batch 3550, loss[loss=0.1883, ctc_loss=0.1227, cr_loss=0.3277, over 17279.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1419, cr_loss=0.3581, over 3359059.32 frames. ], batch size: 44, lr: 5.66e-03, grad_scale: 32.0 2024-09-24 00:29:43,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=380193.3333333333, ans=0.125 2024-09-24 00:29:56,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2024-09-24 00:30:07,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=22.5 2024-09-24 00:30:20,797 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.243e+02 1.324e+02 1.443e+02 2.322e+02, threshold=2.648e+02, percent-clipped=0.0 2024-09-24 00:30:28,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=380333.3333333333, ans=0.0 2024-09-24 00:30:30,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=380333.3333333333, ans=0.125 2024-09-24 00:30:38,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=380333.3333333333, ans=0.125 2024-09-24 00:30:46,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=12.0 2024-09-24 00:30:56,887 INFO [train.py:1198] (3/4) Epoch 21, batch 3600, loss[loss=0.2203, ctc_loss=0.1449, cr_loss=0.3771, over 17052.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1422, cr_loss=0.3589, over 3351142.55 frames. ], batch size: 46, lr: 5.66e-03, grad_scale: 32.0 2024-09-24 00:31:06,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=380426.6666666667, ans=0.025 2024-09-24 00:31:15,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=380473.3333333333, ans=0.125 2024-09-24 00:31:23,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=380473.3333333333, ans=0.1 2024-09-24 00:31:40,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=380520.0, ans=0.05 2024-09-24 00:31:43,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=380566.6666666667, ans=0.125 2024-09-24 00:32:01,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=380613.3333333333, ans=0.0 2024-09-24 00:32:04,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=380613.3333333333, ans=0.1 2024-09-24 00:32:14,928 INFO [train.py:1198] (3/4) Epoch 21, batch 3650, loss[loss=0.2207, ctc_loss=0.147, cr_loss=0.3686, over 17019.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1426, cr_loss=0.3596, over 3348927.44 frames. ], batch size: 53, lr: 5.65e-03, grad_scale: 32.0 2024-09-24 00:32:42,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=380706.6666666667, ans=0.0 2024-09-24 00:32:42,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=380706.6666666667, ans=0.1 2024-09-24 00:32:43,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=380706.6666666667, ans=0.1 2024-09-24 00:32:50,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=380753.3333333333, ans=0.2 2024-09-24 00:32:57,524 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.280e+02 1.369e+02 1.514e+02 2.640e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-24 00:33:09,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=380800.0, ans=0.09899494936611666 2024-09-24 00:33:14,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=380800.0, ans=12.0 2024-09-24 00:33:29,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=380846.6666666667, ans=0.05 2024-09-24 00:33:35,127 INFO [train.py:1198] (3/4) Epoch 21, batch 3700, loss[loss=0.2033, ctc_loss=0.1368, cr_loss=0.3328, over 17207.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1417, cr_loss=0.3582, over 3361628.98 frames. ], batch size: 47, lr: 5.65e-03, grad_scale: 32.0 2024-09-24 00:33:37,583 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2024-09-24 00:33:58,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=380940.0, ans=0.2 2024-09-24 00:34:19,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=380986.6666666667, ans=0.125 2024-09-24 00:34:46,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=381080.0, ans=0.125 2024-09-24 00:34:49,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=381080.0, ans=0.2 2024-09-24 00:34:49,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=381080.0, ans=0.125 2024-09-24 00:34:54,006 INFO [train.py:1198] (3/4) Epoch 21, batch 3750, loss[loss=0.1901, ctc_loss=0.1243, cr_loss=0.3291, over 17008.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1419, cr_loss=0.3584, over 3350409.36 frames. ], batch size: 39, lr: 5.65e-03, grad_scale: 32.0 2024-09-24 00:34:54,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2024-09-24 00:35:02,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=381126.6666666667, ans=0.125 2024-09-24 00:35:13,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=381173.3333333333, ans=0.1 2024-09-24 00:35:14,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=381173.3333333333, ans=0.0 2024-09-24 00:35:22,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=381173.3333333333, ans=0.125 2024-09-24 00:35:24,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.43 vs. limit=15.0 2024-09-24 00:35:31,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=381220.0, ans=0.5 2024-09-24 00:35:36,265 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.271e+02 1.353e+02 1.516e+02 2.351e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-24 00:35:50,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.28 vs. limit=10.0 2024-09-24 00:35:55,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=381313.3333333333, ans=0.0 2024-09-24 00:36:00,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=381313.3333333333, ans=0.0 2024-09-24 00:36:12,542 INFO [train.py:1198] (3/4) Epoch 21, batch 3800, loss[loss=0.2537, ctc_loss=0.1693, cr_loss=0.4217, over 15090.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1427, cr_loss=0.36, over 3339247.39 frames. ], batch size: 89, lr: 5.65e-03, grad_scale: 32.0 2024-09-24 00:36:12,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=381360.0, ans=0.0 2024-09-24 00:36:17,970 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2024-09-24 00:36:23,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=381360.0, ans=0.5 2024-09-24 00:36:33,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=381406.6666666667, ans=0.0 2024-09-24 00:36:36,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=381406.6666666667, ans=0.025 2024-09-24 00:36:37,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=381406.6666666667, ans=0.125 2024-09-24 00:36:59,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2024-09-24 00:37:30,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.87 vs. limit=15.0 2024-09-24 00:37:31,012 INFO [train.py:1198] (3/4) Epoch 21, batch 3850, loss[loss=0.2381, ctc_loss=0.1602, cr_loss=0.3895, over 16921.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1444, cr_loss=0.3614, over 3289811.25 frames. ], batch size: 58, lr: 5.65e-03, grad_scale: 32.0 2024-09-24 00:37:55,336 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 00:38:14,455 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.386e+02 1.498e+02 1.641e+02 2.451e+02, threshold=2.996e+02, percent-clipped=0.0 2024-09-24 00:38:24,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=381733.3333333333, ans=0.0 2024-09-24 00:39:39,375 INFO [train.py:1198] (3/4) Epoch 22, batch 0, loss[loss=0.2209, ctc_loss=0.1457, cr_loss=0.3756, over 17107.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1457, cr_loss=0.3756, over 17107.00 frames. ], batch size: 49, lr: 5.51e-03, grad_scale: 32.0 2024-09-24 00:39:39,375 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 00:39:54,638 INFO [train.py:1230] (3/4) Epoch 22, validation: loss=0.03827, ctc_loss=0.03827, cr_loss=8.092e-15, over 944034.00 frames. 2024-09-24 00:39:54,639 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 00:39:54,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=381808.0, ans=0.2 2024-09-24 00:40:06,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=381808.0, ans=0.0 2024-09-24 00:40:39,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=381901.3333333333, ans=0.0 2024-09-24 00:41:15,227 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.90 vs. limit=15.0 2024-09-24 00:41:17,588 INFO [train.py:1198] (3/4) Epoch 22, batch 50, loss[loss=0.2183, ctc_loss=0.1486, cr_loss=0.3488, over 16720.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1436, cr_loss=0.3636, over 760329.00 frames. ], batch size: 61, lr: 5.51e-03, grad_scale: 32.0 2024-09-24 00:41:36,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=382088.0, ans=0.125 2024-09-24 00:42:06,804 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.269e+02 1.377e+02 1.581e+02 4.736e+02, threshold=2.753e+02, percent-clipped=1.0 2024-09-24 00:42:23,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=382228.0, ans=0.1 2024-09-24 00:42:24,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=382228.0, ans=0.025 2024-09-24 00:42:24,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=382228.0, ans=0.125 2024-09-24 00:42:40,201 INFO [train.py:1198] (3/4) Epoch 22, batch 100, loss[loss=0.2225, ctc_loss=0.1471, cr_loss=0.3766, over 17288.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1427, cr_loss=0.3613, over 1340659.25 frames. ], batch size: 49, lr: 5.51e-03, grad_scale: 32.0 2024-09-24 00:42:40,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=382274.6666666667, ans=0.125 2024-09-24 00:42:46,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=382274.6666666667, ans=10.0 2024-09-24 00:42:48,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=382274.6666666667, ans=0.2 2024-09-24 00:43:05,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=382321.3333333333, ans=0.125 2024-09-24 00:43:13,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=382368.0, ans=0.125 2024-09-24 00:43:15,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=382368.0, ans=0.2 2024-09-24 00:43:52,137 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 00:43:56,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=382461.3333333333, ans=0.125 2024-09-24 00:43:59,806 INFO [train.py:1198] (3/4) Epoch 22, batch 150, loss[loss=0.2302, ctc_loss=0.1555, cr_loss=0.3734, over 16604.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1419, cr_loss=0.3598, over 1790482.99 frames. ], batch size: 61, lr: 5.51e-03, grad_scale: 32.0 2024-09-24 00:44:53,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=382648.0, ans=0.125 2024-09-24 00:44:55,448 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.262e+02 1.352e+02 1.515e+02 2.166e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-24 00:45:02,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=382648.0, ans=0.0 2024-09-24 00:45:02,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=382648.0, ans=0.5 2024-09-24 00:45:29,261 INFO [train.py:1198] (3/4) Epoch 22, batch 200, loss[loss=0.2095, ctc_loss=0.1381, cr_loss=0.3572, over 17209.00 frames. ], tot_loss[loss=0.2123, ctc_loss=0.1407, cr_loss=0.3581, over 2144700.54 frames. ], batch size: 50, lr: 5.51e-03, grad_scale: 32.0 2024-09-24 00:45:59,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=382834.6666666667, ans=0.035 2024-09-24 00:46:20,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=382881.3333333333, ans=0.125 2024-09-24 00:46:30,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=382881.3333333333, ans=0.0 2024-09-24 00:46:48,688 INFO [train.py:1198] (3/4) Epoch 22, batch 250, loss[loss=0.2438, ctc_loss=0.1627, cr_loss=0.4054, over 17224.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1416, cr_loss=0.3595, over 2408845.27 frames. ], batch size: 55, lr: 5.50e-03, grad_scale: 32.0 2024-09-24 00:46:52,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2024-09-24 00:46:55,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=382974.6666666667, ans=0.0 2024-09-24 00:47:14,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383021.3333333333, ans=0.1 2024-09-24 00:47:19,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=383068.0, ans=0.125 2024-09-24 00:47:41,218 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.260e+02 1.348e+02 1.458e+02 2.828e+02, threshold=2.696e+02, percent-clipped=1.0 2024-09-24 00:47:41,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=22.5 2024-09-24 00:47:46,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=383114.6666666667, ans=0.125 2024-09-24 00:47:52,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=383114.6666666667, ans=0.125 2024-09-24 00:47:55,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=383161.3333333333, ans=0.0 2024-09-24 00:48:07,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=383161.3333333333, ans=0.125 2024-09-24 00:48:11,510 INFO [train.py:1198] (3/4) Epoch 22, batch 300, loss[loss=0.2434, ctc_loss=0.1645, cr_loss=0.3948, over 17094.00 frames. ], tot_loss[loss=0.2123, ctc_loss=0.1407, cr_loss=0.3581, over 2627113.78 frames. ], batch size: 49, lr: 5.50e-03, grad_scale: 32.0 2024-09-24 00:48:13,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=383208.0, ans=0.0 2024-09-24 00:48:51,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=383301.3333333333, ans=0.1 2024-09-24 00:49:01,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=383348.0, ans=0.2 2024-09-24 00:49:18,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=383394.6666666667, ans=0.2 2024-09-24 00:49:29,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=383394.6666666667, ans=0.125 2024-09-24 00:49:37,077 INFO [train.py:1198] (3/4) Epoch 22, batch 350, loss[loss=0.2217, ctc_loss=0.1486, cr_loss=0.366, over 17244.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1415, cr_loss=0.3597, over 2784073.69 frames. ], batch size: 55, lr: 5.50e-03, grad_scale: 32.0 2024-09-24 00:49:49,145 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.52 vs. limit=15.0 2024-09-24 00:50:07,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2024-09-24 00:50:28,766 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.255e+02 1.333e+02 1.486e+02 2.174e+02, threshold=2.666e+02, percent-clipped=0.0 2024-09-24 00:50:35,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=383581.3333333333, ans=0.2 2024-09-24 00:50:44,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-09-24 00:50:59,535 INFO [train.py:1198] (3/4) Epoch 22, batch 400, loss[loss=0.2197, ctc_loss=0.1445, cr_loss=0.376, over 17146.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1416, cr_loss=0.3596, over 2914578.90 frames. ], batch size: 45, lr: 5.50e-03, grad_scale: 32.0 2024-09-24 00:51:17,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=383721.3333333333, ans=0.0 2024-09-24 00:51:30,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=383768.0, ans=0.025 2024-09-24 00:52:13,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=383861.3333333333, ans=0.125 2024-09-24 00:52:19,404 INFO [train.py:1198] (3/4) Epoch 22, batch 450, loss[loss=0.2336, ctc_loss=0.1561, cr_loss=0.3873, over 17141.00 frames. ], tot_loss[loss=0.2111, ctc_loss=0.1398, cr_loss=0.3565, over 3021700.27 frames. ], batch size: 48, lr: 5.50e-03, grad_scale: 32.0 2024-09-24 00:52:43,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=383954.6666666667, ans=0.2 2024-09-24 00:52:53,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=384001.3333333333, ans=0.1 2024-09-24 00:52:54,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=384001.3333333333, ans=0.0 2024-09-24 00:52:59,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=384001.3333333333, ans=0.1 2024-09-24 00:53:06,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=22.5 2024-09-24 00:53:10,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=384048.0, ans=0.0 2024-09-24 00:53:11,770 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.289e+02 1.376e+02 1.526e+02 3.562e+02, threshold=2.753e+02, percent-clipped=1.0 2024-09-24 00:53:34,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=384094.6666666667, ans=10.0 2024-09-24 00:53:35,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=384094.6666666667, ans=0.1 2024-09-24 00:53:41,950 INFO [train.py:1198] (3/4) Epoch 22, batch 500, loss[loss=0.2093, ctc_loss=0.1376, cr_loss=0.3585, over 17130.00 frames. ], tot_loss[loss=0.2118, ctc_loss=0.1403, cr_loss=0.3571, over 3102285.81 frames. ], batch size: 48, lr: 5.50e-03, grad_scale: 32.0 2024-09-24 00:54:40,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=384281.3333333333, ans=0.025 2024-09-24 00:54:41,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=384281.3333333333, ans=0.125 2024-09-24 00:54:46,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=384281.3333333333, ans=0.1 2024-09-24 00:54:52,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=384328.0, ans=0.0 2024-09-24 00:54:59,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=384328.0, ans=0.125 2024-09-24 00:55:05,790 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 00:55:07,117 INFO [train.py:1198] (3/4) Epoch 22, batch 550, loss[loss=0.2066, ctc_loss=0.1323, cr_loss=0.3716, over 17148.00 frames. ], tot_loss[loss=0.2124, ctc_loss=0.1409, cr_loss=0.3576, over 3154306.66 frames. ], batch size: 45, lr: 5.49e-03, grad_scale: 32.0 2024-09-24 00:55:59,360 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.258e+02 1.357e+02 1.512e+02 2.238e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-24 00:55:59,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=384514.6666666667, ans=0.125 2024-09-24 00:56:14,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=384561.3333333333, ans=0.125 2024-09-24 00:56:17,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=384561.3333333333, ans=0.125 2024-09-24 00:56:22,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=384561.3333333333, ans=0.125 2024-09-24 00:56:30,206 INFO [train.py:1198] (3/4) Epoch 22, batch 600, loss[loss=0.2226, ctc_loss=0.1497, cr_loss=0.3645, over 16789.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1413, cr_loss=0.3584, over 3199386.07 frames. ], batch size: 61, lr: 5.49e-03, grad_scale: 32.0 2024-09-24 00:56:41,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=384608.0, ans=0.125 2024-09-24 00:56:56,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=384654.6666666667, ans=0.125 2024-09-24 00:56:59,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=384654.6666666667, ans=0.0 2024-09-24 00:57:03,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=384701.3333333333, ans=0.0 2024-09-24 00:57:52,483 INFO [train.py:1198] (3/4) Epoch 22, batch 650, loss[loss=0.1825, ctc_loss=0.1184, cr_loss=0.3204, over 16275.00 frames. ], tot_loss[loss=0.2127, ctc_loss=0.1411, cr_loss=0.3582, over 3235837.26 frames. ], batch size: 36, lr: 5.49e-03, grad_scale: 16.0 2024-09-24 00:58:10,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=384888.0, ans=0.125 2024-09-24 00:58:10,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=384888.0, ans=0.1 2024-09-24 00:58:15,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=384888.0, ans=0.2 2024-09-24 00:58:18,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=384888.0, ans=0.125 2024-09-24 00:58:43,615 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.285e+02 1.401e+02 1.551e+02 2.497e+02, threshold=2.802e+02, percent-clipped=0.0 2024-09-24 00:58:44,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2024-09-24 00:58:45,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=384981.3333333333, ans=0.0 2024-09-24 00:58:56,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.35 vs. limit=15.0 2024-09-24 00:59:01,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=385028.0, ans=0.125 2024-09-24 00:59:12,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=385028.0, ans=0.0 2024-09-24 00:59:15,117 INFO [train.py:1198] (3/4) Epoch 22, batch 700, loss[loss=0.2233, ctc_loss=0.1484, cr_loss=0.3747, over 17222.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1415, cr_loss=0.3587, over 3265289.37 frames. ], batch size: 47, lr: 5.49e-03, grad_scale: 16.0 2024-09-24 00:59:19,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2024-09-24 00:59:34,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2024-09-24 00:59:47,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=385121.3333333333, ans=0.125 2024-09-24 01:00:40,660 INFO [train.py:1198] (3/4) Epoch 22, batch 750, loss[loss=0.2256, ctc_loss=0.1517, cr_loss=0.3697, over 15977.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.1411, cr_loss=0.3585, over 3286711.85 frames. ], batch size: 74, lr: 5.49e-03, grad_scale: 8.0 2024-09-24 01:00:59,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=385354.6666666667, ans=0.2 2024-09-24 01:01:18,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=385401.3333333333, ans=0.125 2024-09-24 01:01:33,050 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.232e+02 1.335e+02 1.516e+02 2.427e+02, threshold=2.671e+02, percent-clipped=0.0 2024-09-24 01:01:37,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2024-09-24 01:01:43,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=385494.6666666667, ans=0.125 2024-09-24 01:01:51,458 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2024-09-24 01:02:00,330 INFO [train.py:1198] (3/4) Epoch 22, batch 800, loss[loss=0.234, ctc_loss=0.1572, cr_loss=0.3842, over 16753.00 frames. ], tot_loss[loss=0.2113, ctc_loss=0.14, cr_loss=0.3564, over 3308318.60 frames. ], batch size: 61, lr: 5.49e-03, grad_scale: 16.0 2024-09-24 01:02:17,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2024-09-24 01:03:14,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2024-09-24 01:03:15,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=385728.0, ans=0.0 2024-09-24 01:03:17,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=385728.0, ans=0.125 2024-09-24 01:03:23,167 INFO [train.py:1198] (3/4) Epoch 22, batch 850, loss[loss=0.234, ctc_loss=0.1575, cr_loss=0.3823, over 17223.00 frames. ], tot_loss[loss=0.2111, ctc_loss=0.1399, cr_loss=0.3562, over 3330819.14 frames. ], batch size: 55, lr: 5.48e-03, grad_scale: 16.0 2024-09-24 01:03:30,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-09-24 01:03:38,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.44 vs. limit=22.5 2024-09-24 01:03:49,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=385821.3333333333, ans=0.025 2024-09-24 01:03:52,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=385821.3333333333, ans=0.0 2024-09-24 01:03:55,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=385868.0, ans=0.2 2024-09-24 01:04:18,869 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.270e+02 1.386e+02 1.514e+02 2.172e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-24 01:04:19,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2024-09-24 01:04:39,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=385961.3333333333, ans=0.125 2024-09-24 01:04:48,969 INFO [train.py:1198] (3/4) Epoch 22, batch 900, loss[loss=0.1911, ctc_loss=0.1251, cr_loss=0.3299, over 16929.00 frames. ], tot_loss[loss=0.2121, ctc_loss=0.1407, cr_loss=0.3572, over 3333306.40 frames. ], batch size: 42, lr: 5.48e-03, grad_scale: 16.0 2024-09-24 01:05:14,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=386054.6666666667, ans=0.125 2024-09-24 01:05:18,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=386054.6666666667, ans=0.0 2024-09-24 01:05:19,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=386054.6666666667, ans=0.125 2024-09-24 01:05:22,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=386101.3333333333, ans=0.125 2024-09-24 01:05:27,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=22.5 2024-09-24 01:05:35,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.10 vs. limit=6.0 2024-09-24 01:06:11,842 INFO [train.py:1198] (3/4) Epoch 22, batch 950, loss[loss=0.2335, ctc_loss=0.154, cr_loss=0.3972, over 17307.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.1415, cr_loss=0.3581, over 3333094.63 frames. ], batch size: 49, lr: 5.48e-03, grad_scale: 16.0 2024-09-24 01:06:34,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=386288.0, ans=0.125 2024-09-24 01:06:48,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386334.6666666667, ans=0.1 2024-09-24 01:07:03,757 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.224e+02 1.313e+02 1.395e+02 1.890e+02, threshold=2.625e+02, percent-clipped=0.0 2024-09-24 01:07:20,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=386428.0, ans=0.1 2024-09-24 01:07:31,056 INFO [train.py:1198] (3/4) Epoch 22, batch 1000, loss[loss=0.2561, ctc_loss=0.1702, cr_loss=0.4294, over 17219.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1417, cr_loss=0.3592, over 3341996.62 frames. ], batch size: 55, lr: 5.48e-03, grad_scale: 16.0 2024-09-24 01:07:40,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=386474.6666666667, ans=0.125 2024-09-24 01:07:45,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=386474.6666666667, ans=0.025 2024-09-24 01:07:49,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386521.3333333333, ans=0.1 2024-09-24 01:08:04,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2024-09-24 01:08:20,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=386614.6666666667, ans=0.125 2024-09-24 01:08:37,666 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:08:42,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386661.3333333333, ans=0.1 2024-09-24 01:08:53,200 INFO [train.py:1198] (3/4) Epoch 22, batch 1050, loss[loss=0.2086, ctc_loss=0.1373, cr_loss=0.3565, over 17308.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1413, cr_loss=0.3584, over 3349289.32 frames. ], batch size: 46, lr: 5.48e-03, grad_scale: 16.0 2024-09-24 01:09:15,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=386754.6666666667, ans=0.125 2024-09-24 01:09:38,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=386801.3333333333, ans=0.125 2024-09-24 01:09:50,910 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.253e+02 1.357e+02 1.506e+02 3.378e+02, threshold=2.715e+02, percent-clipped=1.0 2024-09-24 01:09:54,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=386848.0, ans=0.1 2024-09-24 01:10:04,361 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=15.0 2024-09-24 01:10:10,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=386894.6666666667, ans=0.0 2024-09-24 01:10:20,570 INFO [train.py:1198] (3/4) Epoch 22, batch 1100, loss[loss=0.2009, ctc_loss=0.1313, cr_loss=0.3485, over 17172.00 frames. ], tot_loss[loss=0.2121, ctc_loss=0.1406, cr_loss=0.3571, over 3347315.26 frames. ], batch size: 45, lr: 5.48e-03, grad_scale: 16.0 2024-09-24 01:10:46,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=386988.0, ans=0.0 2024-09-24 01:10:52,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=387034.6666666667, ans=0.1 2024-09-24 01:11:01,098 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:11:40,359 INFO [train.py:1198] (3/4) Epoch 22, batch 1150, loss[loss=0.189, ctc_loss=0.1263, cr_loss=0.3132, over 17297.00 frames. ], tot_loss[loss=0.2125, ctc_loss=0.141, cr_loss=0.3573, over 3343823.03 frames. ], batch size: 46, lr: 5.47e-03, grad_scale: 16.0 2024-09-24 01:12:36,141 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.223e+02 1.330e+02 1.443e+02 2.592e+02, threshold=2.661e+02, percent-clipped=0.0 2024-09-24 01:12:36,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=387314.6666666667, ans=0.125 2024-09-24 01:13:03,660 INFO [train.py:1198] (3/4) Epoch 22, batch 1200, loss[loss=0.1833, ctc_loss=0.1212, cr_loss=0.3108, over 17284.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.1414, cr_loss=0.3584, over 3347694.87 frames. ], batch size: 42, lr: 5.47e-03, grad_scale: 32.0 2024-09-24 01:13:17,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.02 vs. limit=15.0 2024-09-24 01:13:17,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.15 vs. limit=22.5 2024-09-24 01:13:27,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=387454.6666666667, ans=0.0 2024-09-24 01:13:44,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.70 vs. limit=15.0 2024-09-24 01:13:55,245 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:14:18,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=387594.6666666667, ans=0.0 2024-09-24 01:14:25,967 INFO [train.py:1198] (3/4) Epoch 22, batch 1250, loss[loss=0.1764, ctc_loss=0.1148, cr_loss=0.3081, over 17089.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1422, cr_loss=0.3602, over 3342236.78 frames. ], batch size: 40, lr: 5.47e-03, grad_scale: 32.0 2024-09-24 01:14:35,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=387641.3333333333, ans=0.1 2024-09-24 01:14:40,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=387641.3333333333, ans=0.0 2024-09-24 01:14:44,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=387688.0, ans=0.05 2024-09-24 01:14:50,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2024-09-24 01:14:53,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2024-09-24 01:14:54,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=387688.0, ans=0.125 2024-09-24 01:14:59,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.04 vs. limit=15.0 2024-09-24 01:15:23,794 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.292e+02 1.416e+02 1.548e+02 3.016e+02, threshold=2.831e+02, percent-clipped=1.0 2024-09-24 01:15:50,986 INFO [train.py:1198] (3/4) Epoch 22, batch 1300, loss[loss=0.2155, ctc_loss=0.1427, cr_loss=0.3641, over 17031.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1419, cr_loss=0.3597, over 3346446.72 frames. ], batch size: 56, lr: 5.47e-03, grad_scale: 32.0 2024-09-24 01:16:20,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=387921.3333333333, ans=0.125 2024-09-24 01:16:59,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=388061.3333333333, ans=0.125 2024-09-24 01:17:01,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=388061.3333333333, ans=0.2 2024-09-24 01:17:10,182 INFO [train.py:1198] (3/4) Epoch 22, batch 1350, loss[loss=0.2157, ctc_loss=0.1425, cr_loss=0.3663, over 17030.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1417, cr_loss=0.3601, over 3356922.19 frames. ], batch size: 56, lr: 5.47e-03, grad_scale: 32.0 2024-09-24 01:17:15,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=388108.0, ans=0.0 2024-09-24 01:17:24,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=388108.0, ans=0.025 2024-09-24 01:17:40,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=388154.6666666667, ans=0.1 2024-09-24 01:17:41,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=388154.6666666667, ans=0.125 2024-09-24 01:17:47,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=22.5 2024-09-24 01:17:53,693 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2024-09-24 01:18:05,574 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.287e+02 1.390e+02 1.521e+02 2.749e+02, threshold=2.781e+02, percent-clipped=0.0 2024-09-24 01:18:32,815 INFO [train.py:1198] (3/4) Epoch 22, batch 1400, loss[loss=0.2184, ctc_loss=0.1434, cr_loss=0.3751, over 17073.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1411, cr_loss=0.3596, over 3363220.08 frames. ], batch size: 46, lr: 5.47e-03, grad_scale: 32.0 2024-09-24 01:18:34,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=388341.3333333333, ans=0.0 2024-09-24 01:18:42,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=388341.3333333333, ans=0.125 2024-09-24 01:19:00,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-09-24 01:19:10,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=388434.6666666667, ans=0.0 2024-09-24 01:19:16,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=388434.6666666667, ans=0.2 2024-09-24 01:19:40,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=388528.0, ans=0.025 2024-09-24 01:19:50,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=388528.0, ans=6.0 2024-09-24 01:19:57,777 INFO [train.py:1198] (3/4) Epoch 22, batch 1450, loss[loss=0.2354, ctc_loss=0.155, cr_loss=0.4022, over 17225.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1411, cr_loss=0.3595, over 3368272.32 frames. ], batch size: 50, lr: 5.46e-03, grad_scale: 32.0 2024-09-24 01:19:59,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=15.0 2024-09-24 01:20:14,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=388621.3333333333, ans=0.0 2024-09-24 01:20:52,838 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.236e+02 1.340e+02 1.484e+02 2.143e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-24 01:21:07,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=388761.3333333333, ans=0.125 2024-09-24 01:21:14,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=388761.3333333333, ans=0.0 2024-09-24 01:21:20,284 INFO [train.py:1198] (3/4) Epoch 22, batch 1500, loss[loss=0.2613, ctc_loss=0.1744, cr_loss=0.4345, over 16533.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1412, cr_loss=0.3589, over 3366615.92 frames. ], batch size: 66, lr: 5.46e-03, grad_scale: 32.0 2024-09-24 01:21:27,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2024-09-24 01:21:31,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=388808.0, ans=0.125 2024-09-24 01:22:29,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2024-09-24 01:22:42,364 INFO [train.py:1198] (3/4) Epoch 22, batch 1550, loss[loss=0.1685, ctc_loss=0.109, cr_loss=0.2974, over 16697.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1417, cr_loss=0.3588, over 3344362.29 frames. ], batch size: 37, lr: 5.46e-03, grad_scale: 16.0 2024-09-24 01:22:47,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=389041.3333333333, ans=0.125 2024-09-24 01:22:58,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=389088.0, ans=0.125 2024-09-24 01:23:30,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=389181.3333333333, ans=0.125 2024-09-24 01:23:32,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=389181.3333333333, ans=0.0 2024-09-24 01:23:36,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=15.0 2024-09-24 01:23:37,028 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.256e+02 1.352e+02 1.468e+02 3.649e+02, threshold=2.704e+02, percent-clipped=1.0 2024-09-24 01:23:51,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=389228.0, ans=0.2 2024-09-24 01:23:57,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=22.5 2024-09-24 01:24:02,778 INFO [train.py:1198] (3/4) Epoch 22, batch 1600, loss[loss=0.2521, ctc_loss=0.1723, cr_loss=0.3988, over 17015.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1427, cr_loss=0.3606, over 3349899.92 frames. ], batch size: 53, lr: 5.46e-03, grad_scale: 32.0 2024-09-24 01:24:30,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=389321.3333333333, ans=0.125 2024-09-24 01:24:55,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=15.0 2024-09-24 01:24:57,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=389414.6666666667, ans=0.0 2024-09-24 01:25:00,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=389414.6666666667, ans=0.125 2024-09-24 01:25:09,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=389414.6666666667, ans=0.95 2024-09-24 01:25:09,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.46 vs. limit=10.0 2024-09-24 01:25:29,443 INFO [train.py:1198] (3/4) Epoch 22, batch 1650, loss[loss=0.2383, ctc_loss=0.1621, cr_loss=0.381, over 15875.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1428, cr_loss=0.3606, over 3343028.48 frames. ], batch size: 74, lr: 5.46e-03, grad_scale: 32.0 2024-09-24 01:25:43,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2024-09-24 01:26:23,522 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.218e+02 1.295e+02 1.408e+02 1.987e+02, threshold=2.589e+02, percent-clipped=0.0 2024-09-24 01:26:26,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=389648.0, ans=0.0 2024-09-24 01:26:28,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=389648.0, ans=0.0 2024-09-24 01:26:30,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=389648.0, ans=0.125 2024-09-24 01:26:39,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=389694.6666666667, ans=0.1 2024-09-24 01:26:49,451 INFO [train.py:1198] (3/4) Epoch 22, batch 1700, loss[loss=0.236, ctc_loss=0.1663, cr_loss=0.3486, over 11251.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1423, cr_loss=0.3595, over 3344017.42 frames. ], batch size: 123, lr: 5.46e-03, grad_scale: 32.0 2024-09-24 01:26:57,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=389741.3333333333, ans=0.2 2024-09-24 01:27:12,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=389788.0, ans=0.125 2024-09-24 01:27:23,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389834.6666666667, ans=0.1 2024-09-24 01:27:26,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=389834.6666666667, ans=0.04949747468305833 2024-09-24 01:27:37,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=389834.6666666667, ans=0.125 2024-09-24 01:27:45,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=389881.3333333333, ans=0.125 2024-09-24 01:27:53,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=389881.3333333333, ans=0.0 2024-09-24 01:27:53,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=389881.3333333333, ans=0.0 2024-09-24 01:28:11,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=389974.6666666667, ans=0.0 2024-09-24 01:28:12,341 INFO [train.py:1198] (3/4) Epoch 22, batch 1750, loss[loss=0.2309, ctc_loss=0.1583, cr_loss=0.3627, over 17144.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1417, cr_loss=0.3582, over 3340968.22 frames. ], batch size: 48, lr: 5.46e-03, grad_scale: 32.0 2024-09-24 01:28:31,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=390021.3333333333, ans=0.125 2024-09-24 01:28:38,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=390021.3333333333, ans=10.0 2024-09-24 01:29:03,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=390114.6666666667, ans=0.07 2024-09-24 01:29:08,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=390114.6666666667, ans=0.0 2024-09-24 01:29:09,555 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.027e+02 1.289e+02 1.384e+02 1.529e+02 2.458e+02, threshold=2.768e+02, percent-clipped=0.0 2024-09-24 01:29:24,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.70 vs. limit=15.0 2024-09-24 01:29:37,596 INFO [train.py:1198] (3/4) Epoch 22, batch 1800, loss[loss=0.1992, ctc_loss=0.1291, cr_loss=0.3504, over 17253.00 frames. ], tot_loss[loss=0.2114, ctc_loss=0.1402, cr_loss=0.3559, over 3349903.11 frames. ], batch size: 44, lr: 5.45e-03, grad_scale: 32.0 2024-09-24 01:29:55,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=390254.6666666667, ans=0.125 2024-09-24 01:29:57,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2024-09-24 01:30:00,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=390254.6666666667, ans=0.0 2024-09-24 01:30:04,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=390254.6666666667, ans=0.125 2024-09-24 01:30:18,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=390301.3333333333, ans=0.07 2024-09-24 01:30:47,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=390394.6666666667, ans=0.125 2024-09-24 01:30:54,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=390394.6666666667, ans=0.07 2024-09-24 01:30:59,331 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=15.0 2024-09-24 01:31:00,216 INFO [train.py:1198] (3/4) Epoch 22, batch 1850, loss[loss=0.2409, ctc_loss=0.167, cr_loss=0.3697, over 12493.00 frames. ], tot_loss[loss=0.2129, ctc_loss=0.1413, cr_loss=0.3579, over 3348057.01 frames. ], batch size: 123, lr: 5.45e-03, grad_scale: 32.0 2024-09-24 01:31:30,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=390534.6666666667, ans=0.125 2024-09-24 01:31:52,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=22.5 2024-09-24 01:31:54,070 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.256e+02 1.338e+02 1.460e+02 2.391e+02, threshold=2.675e+02, percent-clipped=0.0 2024-09-24 01:32:21,843 INFO [train.py:1198] (3/4) Epoch 22, batch 1900, loss[loss=0.1844, ctc_loss=0.1202, cr_loss=0.3212, over 17076.00 frames. ], tot_loss[loss=0.2118, ctc_loss=0.1405, cr_loss=0.3564, over 3353020.26 frames. ], batch size: 43, lr: 5.45e-03, grad_scale: 32.0 2024-09-24 01:32:52,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=390768.0, ans=0.125 2024-09-24 01:32:59,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=390768.0, ans=0.1 2024-09-24 01:33:29,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=390861.3333333333, ans=0.1 2024-09-24 01:33:41,708 INFO [train.py:1198] (3/4) Epoch 22, batch 1950, loss[loss=0.1744, ctc_loss=0.1136, cr_loss=0.3038, over 16642.00 frames. ], tot_loss[loss=0.2112, ctc_loss=0.1401, cr_loss=0.3554, over 3351076.91 frames. ], batch size: 37, lr: 5.45e-03, grad_scale: 32.0 2024-09-24 01:33:45,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=390908.0, ans=0.2 2024-09-24 01:34:41,243 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.293e+02 1.356e+02 1.523e+02 3.316e+02, threshold=2.712e+02, percent-clipped=1.0 2024-09-24 01:35:01,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=391094.6666666667, ans=0.125 2024-09-24 01:35:04,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=391094.6666666667, ans=10.0 2024-09-24 01:35:06,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=391094.6666666667, ans=0.1 2024-09-24 01:35:09,058 INFO [train.py:1198] (3/4) Epoch 22, batch 2000, loss[loss=0.2228, ctc_loss=0.1513, cr_loss=0.3578, over 17035.00 frames. ], tot_loss[loss=0.2115, ctc_loss=0.1403, cr_loss=0.3559, over 3343922.32 frames. ], batch size: 53, lr: 5.45e-03, grad_scale: 32.0 2024-09-24 01:35:10,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=391141.3333333333, ans=0.125 2024-09-24 01:35:15,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=391141.3333333333, ans=0.125 2024-09-24 01:35:21,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=22.5 2024-09-24 01:35:45,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=391234.6666666667, ans=0.04949747468305833 2024-09-24 01:35:50,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=391234.6666666667, ans=0.025 2024-09-24 01:35:56,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=15.0 2024-09-24 01:36:11,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=391328.0, ans=0.125 2024-09-24 01:36:14,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391328.0, ans=0.1 2024-09-24 01:36:16,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.86 vs. limit=22.5 2024-09-24 01:36:28,621 INFO [train.py:1198] (3/4) Epoch 22, batch 2050, loss[loss=0.2132, ctc_loss=0.1426, cr_loss=0.3532, over 17376.00 frames. ], tot_loss[loss=0.2109, ctc_loss=0.1398, cr_loss=0.3555, over 3343592.15 frames. ], batch size: 48, lr: 5.45e-03, grad_scale: 16.0 2024-09-24 01:36:30,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2024-09-24 01:36:46,906 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-09-24 01:37:09,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2024-09-24 01:37:22,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=391514.6666666667, ans=0.0 2024-09-24 01:37:27,042 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.255e+02 1.364e+02 1.443e+02 3.272e+02, threshold=2.728e+02, percent-clipped=1.0 2024-09-24 01:37:46,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=391561.3333333333, ans=0.0 2024-09-24 01:37:51,064 INFO [train.py:1198] (3/4) Epoch 22, batch 2100, loss[loss=0.2178, ctc_loss=0.1476, cr_loss=0.3513, over 17022.00 frames. ], tot_loss[loss=0.2106, ctc_loss=0.1395, cr_loss=0.3556, over 3354749.21 frames. ], batch size: 51, lr: 5.44e-03, grad_scale: 16.0 2024-09-24 01:38:07,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=391654.6666666667, ans=0.125 2024-09-24 01:38:31,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=391701.3333333333, ans=0.1 2024-09-24 01:38:56,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.97 vs. limit=6.0 2024-09-24 01:39:11,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=391794.6666666667, ans=0.0 2024-09-24 01:39:16,046 INFO [train.py:1198] (3/4) Epoch 22, batch 2150, loss[loss=0.21, ctc_loss=0.1349, cr_loss=0.3755, over 17064.00 frames. ], tot_loss[loss=0.211, ctc_loss=0.1398, cr_loss=0.356, over 3350592.84 frames. ], batch size: 46, lr: 5.44e-03, grad_scale: 16.0 2024-09-24 01:39:58,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=391934.6666666667, ans=0.0 2024-09-24 01:40:08,664 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:40:16,911 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.257e+02 1.348e+02 1.506e+02 2.274e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-24 01:40:20,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=391981.3333333333, ans=0.125 2024-09-24 01:40:36,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=392028.0, ans=0.5 2024-09-24 01:40:40,618 INFO [train.py:1198] (3/4) Epoch 22, batch 2200, loss[loss=0.2336, ctc_loss=0.1567, cr_loss=0.3844, over 16551.00 frames. ], tot_loss[loss=0.2119, ctc_loss=0.1407, cr_loss=0.3561, over 3338268.39 frames. ], batch size: 66, lr: 5.44e-03, grad_scale: 16.0 2024-09-24 01:40:44,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=392074.6666666667, ans=0.2 2024-09-24 01:40:55,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=392121.3333333333, ans=0.125 2024-09-24 01:41:29,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=15.0 2024-09-24 01:41:30,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=392214.6666666667, ans=0.125 2024-09-24 01:41:37,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=392214.6666666667, ans=0.1 2024-09-24 01:41:40,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=392214.6666666667, ans=0.05 2024-09-24 01:41:43,843 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:41:57,466 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:41:58,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=392261.3333333333, ans=0.0 2024-09-24 01:42:02,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=392308.0, ans=0.0 2024-09-24 01:42:03,261 INFO [train.py:1198] (3/4) Epoch 22, batch 2250, loss[loss=0.21, ctc_loss=0.1373, cr_loss=0.3636, over 17041.00 frames. ], tot_loss[loss=0.2123, ctc_loss=0.1408, cr_loss=0.3572, over 3341646.92 frames. ], batch size: 56, lr: 5.44e-03, grad_scale: 16.0 2024-09-24 01:42:08,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.24 vs. limit=6.0 2024-09-24 01:42:19,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=392354.6666666667, ans=0.0 2024-09-24 01:42:30,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=392354.6666666667, ans=0.2 2024-09-24 01:42:50,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=392448.0, ans=0.0 2024-09-24 01:42:50,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=392448.0, ans=0.2 2024-09-24 01:42:59,353 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.230e+02 1.321e+02 1.423e+02 2.235e+02, threshold=2.642e+02, percent-clipped=0.0 2024-09-24 01:43:22,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.02 vs. limit=10.0 2024-09-24 01:43:23,788 INFO [train.py:1198] (3/4) Epoch 22, batch 2300, loss[loss=0.2135, ctc_loss=0.1414, cr_loss=0.3606, over 17289.00 frames. ], tot_loss[loss=0.2122, ctc_loss=0.1408, cr_loss=0.3569, over 3344209.78 frames. ], batch size: 49, lr: 5.44e-03, grad_scale: 16.0 2024-09-24 01:43:27,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=392541.3333333333, ans=0.1 2024-09-24 01:43:30,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=392541.3333333333, ans=0.125 2024-09-24 01:44:47,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=392774.6666666667, ans=0.1 2024-09-24 01:44:51,575 INFO [train.py:1198] (3/4) Epoch 22, batch 2350, loss[loss=0.2036, ctc_loss=0.1312, cr_loss=0.3621, over 17159.00 frames. ], tot_loss[loss=0.2119, ctc_loss=0.1405, cr_loss=0.3569, over 3351487.51 frames. ], batch size: 45, lr: 5.44e-03, grad_scale: 16.0 2024-09-24 01:45:46,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=392914.6666666667, ans=0.0 2024-09-24 01:45:47,294 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.237e+02 1.319e+02 1.405e+02 2.078e+02, threshold=2.638e+02, percent-clipped=0.0 2024-09-24 01:45:51,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.93 vs. limit=15.0 2024-09-24 01:46:05,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=392961.3333333333, ans=0.1 2024-09-24 01:46:11,758 INFO [train.py:1198] (3/4) Epoch 22, batch 2400, loss[loss=0.1941, ctc_loss=0.1274, cr_loss=0.3336, over 17223.00 frames. ], tot_loss[loss=0.2116, ctc_loss=0.1404, cr_loss=0.3563, over 3348636.74 frames. ], batch size: 50, lr: 5.43e-03, grad_scale: 32.0 2024-09-24 01:46:45,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=393101.3333333333, ans=0.0 2024-09-24 01:46:56,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=393101.3333333333, ans=0.125 2024-09-24 01:46:57,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=393101.3333333333, ans=0.1 2024-09-24 01:47:15,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=393148.0, ans=0.2 2024-09-24 01:47:34,555 INFO [train.py:1198] (3/4) Epoch 22, batch 2450, loss[loss=0.2278, ctc_loss=0.1503, cr_loss=0.3878, over 17171.00 frames. ], tot_loss[loss=0.2123, ctc_loss=0.1409, cr_loss=0.3572, over 3341293.53 frames. ], batch size: 45, lr: 5.43e-03, grad_scale: 32.0 2024-09-24 01:48:03,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=393288.0, ans=0.125 2024-09-24 01:48:12,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2024-09-24 01:48:29,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=393381.3333333333, ans=0.125 2024-09-24 01:48:30,720 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.273e+02 1.360e+02 1.558e+02 2.301e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-24 01:48:52,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=393428.0, ans=0.125 2024-09-24 01:48:57,401 INFO [train.py:1198] (3/4) Epoch 22, batch 2500, loss[loss=0.1698, ctc_loss=0.1088, cr_loss=0.3049, over 17029.00 frames. ], tot_loss[loss=0.2116, ctc_loss=0.1403, cr_loss=0.3566, over 3353028.49 frames. ], batch size: 39, lr: 5.43e-03, grad_scale: 32.0 2024-09-24 01:49:11,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=393474.6666666667, ans=0.0 2024-09-24 01:49:41,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=393568.0, ans=0.125 2024-09-24 01:49:43,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=393568.0, ans=0.125 2024-09-24 01:49:56,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=393614.6666666667, ans=0.0 2024-09-24 01:49:58,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=393614.6666666667, ans=0.125 2024-09-24 01:50:22,199 INFO [train.py:1198] (3/4) Epoch 22, batch 2550, loss[loss=0.2368, ctc_loss=0.1599, cr_loss=0.3845, over 16944.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.1414, cr_loss=0.3584, over 3344935.82 frames. ], batch size: 58, lr: 5.43e-03, grad_scale: 32.0 2024-09-24 01:50:24,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=393708.0, ans=0.2 2024-09-24 01:50:38,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=393754.6666666667, ans=0.125 2024-09-24 01:50:48,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=393754.6666666667, ans=10.0 2024-09-24 01:50:53,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=393801.3333333333, ans=0.125 2024-09-24 01:50:58,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=393801.3333333333, ans=0.125 2024-09-24 01:51:17,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=393848.0, ans=0.0 2024-09-24 01:51:18,511 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.96 vs. limit=10.0 2024-09-24 01:51:18,857 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.258e+02 1.341e+02 1.475e+02 2.313e+02, threshold=2.682e+02, percent-clipped=0.0 2024-09-24 01:51:19,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=393848.0, ans=0.125 2024-09-24 01:51:28,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=393894.6666666667, ans=0.2 2024-09-24 01:51:43,238 INFO [train.py:1198] (3/4) Epoch 22, batch 2600, loss[loss=0.2503, ctc_loss=0.1685, cr_loss=0.409, over 14889.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1421, cr_loss=0.3588, over 3335886.74 frames. ], batch size: 88, lr: 5.43e-03, grad_scale: 32.0 2024-09-24 01:51:52,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=393941.3333333333, ans=0.04949747468305833 2024-09-24 01:51:59,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=393941.3333333333, ans=0.09899494936611666 2024-09-24 01:52:02,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=393988.0, ans=0.125 2024-09-24 01:52:16,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=394034.6666666667, ans=0.0 2024-09-24 01:52:40,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=394081.3333333333, ans=0.0 2024-09-24 01:52:40,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394081.3333333333, ans=0.1 2024-09-24 01:52:52,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=394128.0, ans=0.09899494936611666 2024-09-24 01:53:06,119 INFO [train.py:1198] (3/4) Epoch 22, batch 2650, loss[loss=0.1897, ctc_loss=0.1246, cr_loss=0.3259, over 17364.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1421, cr_loss=0.359, over 3335957.15 frames. ], batch size: 48, lr: 5.43e-03, grad_scale: 32.0 2024-09-24 01:53:06,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=394174.6666666667, ans=0.0 2024-09-24 01:53:19,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=394174.6666666667, ans=0.0 2024-09-24 01:53:22,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=394221.3333333333, ans=0.125 2024-09-24 01:53:56,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.36 vs. limit=12.0 2024-09-24 01:53:57,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=394314.6666666667, ans=0.125 2024-09-24 01:54:00,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.73 vs. limit=15.0 2024-09-24 01:54:07,323 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.255e+02 1.320e+02 1.434e+02 1.908e+02, threshold=2.640e+02, percent-clipped=0.0 2024-09-24 01:54:31,192 INFO [train.py:1198] (3/4) Epoch 22, batch 2700, loss[loss=0.2306, ctc_loss=0.16, cr_loss=0.353, over 17217.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1418, cr_loss=0.3586, over 3346385.46 frames. ], batch size: 50, lr: 5.42e-03, grad_scale: 32.0 2024-09-24 01:54:36,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=394408.0, ans=0.125 2024-09-24 01:54:42,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=394408.0, ans=0.125 2024-09-24 01:55:08,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=394501.3333333333, ans=0.0 2024-09-24 01:55:10,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=394501.3333333333, ans=0.125 2024-09-24 01:55:29,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=394548.0, ans=0.0 2024-09-24 01:55:33,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=394548.0, ans=0.125 2024-09-24 01:55:45,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=394594.6666666667, ans=0.125 2024-09-24 01:55:50,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=12.0 2024-09-24 01:55:53,381 INFO [train.py:1198] (3/4) Epoch 22, batch 2750, loss[loss=0.2181, ctc_loss=0.1419, cr_loss=0.3806, over 17325.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1417, cr_loss=0.3587, over 3348305.58 frames. ], batch size: 51, lr: 5.42e-03, grad_scale: 32.0 2024-09-24 01:56:04,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=394641.3333333333, ans=0.2 2024-09-24 01:56:25,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=394734.6666666667, ans=0.0 2024-09-24 01:56:49,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=394781.3333333333, ans=0.025 2024-09-24 01:56:52,099 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.297e+02 1.428e+02 1.577e+02 2.482e+02, threshold=2.855e+02, percent-clipped=0.0 2024-09-24 01:57:16,464 INFO [train.py:1198] (3/4) Epoch 22, batch 2800, loss[loss=0.2047, ctc_loss=0.1359, cr_loss=0.3439, over 16946.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.1415, cr_loss=0.3582, over 3351700.08 frames. ], batch size: 42, lr: 5.42e-03, grad_scale: 32.0 2024-09-24 01:57:54,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=394968.0, ans=0.125 2024-09-24 01:58:01,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.61 vs. limit=15.0 2024-09-24 01:58:18,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.67 vs. limit=15.0 2024-09-24 01:58:29,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=395061.3333333333, ans=0.0 2024-09-24 01:58:32,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=395061.3333333333, ans=0.05 2024-09-24 01:58:33,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.80 vs. limit=6.0 2024-09-24 01:58:38,326 INFO [train.py:1198] (3/4) Epoch 22, batch 2850, loss[loss=0.2005, ctc_loss=0.1301, cr_loss=0.352, over 17314.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.1412, cr_loss=0.3581, over 3360945.74 frames. ], batch size: 51, lr: 5.42e-03, grad_scale: 32.0 2024-09-24 01:58:45,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2024-09-24 01:59:02,474 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:59:31,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=395248.0, ans=0.125 2024-09-24 01:59:36,607 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.264e+02 1.351e+02 1.437e+02 2.135e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-24 02:00:03,219 INFO [train.py:1198] (3/4) Epoch 22, batch 2900, loss[loss=0.2044, ctc_loss=0.1352, cr_loss=0.3459, over 17223.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.1414, cr_loss=0.3583, over 3359950.01 frames. ], batch size: 47, lr: 5.42e-03, grad_scale: 32.0 2024-09-24 02:00:05,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=395341.3333333333, ans=0.125 2024-09-24 02:00:09,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=395341.3333333333, ans=0.0 2024-09-24 02:00:14,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=12.0 2024-09-24 02:00:36,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=395434.6666666667, ans=0.025 2024-09-24 02:00:38,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=395434.6666666667, ans=0.125 2024-09-24 02:01:16,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=395528.0, ans=15.0 2024-09-24 02:01:20,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=12.0 2024-09-24 02:01:23,079 INFO [train.py:1198] (3/4) Epoch 22, batch 2950, loss[loss=0.1854, ctc_loss=0.1196, cr_loss=0.3288, over 17254.00 frames. ], tot_loss[loss=0.2123, ctc_loss=0.1408, cr_loss=0.3575, over 3363519.66 frames. ], batch size: 42, lr: 5.42e-03, grad_scale: 32.0 2024-09-24 02:01:23,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=22.5 2024-09-24 02:01:31,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=395574.6666666667, ans=0.09899494936611666 2024-09-24 02:02:09,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=395668.0, ans=6.0 2024-09-24 02:02:10,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=395668.0, ans=0.025 2024-09-24 02:02:21,884 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.266e+02 1.346e+02 1.444e+02 1.756e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-24 02:02:39,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=395761.3333333333, ans=0.2 2024-09-24 02:02:45,310 INFO [train.py:1198] (3/4) Epoch 22, batch 3000, loss[loss=0.2635, ctc_loss=0.1866, cr_loss=0.3843, over 11667.00 frames. ], tot_loss[loss=0.2118, ctc_loss=0.1404, cr_loss=0.3569, over 3363419.50 frames. ], batch size: 123, lr: 5.42e-03, grad_scale: 32.0 2024-09-24 02:02:45,310 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 02:03:00,736 INFO [train.py:1230] (3/4) Epoch 22, validation: loss=0.03869, ctc_loss=0.03869, cr_loss=8.188e-15, over 944034.00 frames. 2024-09-24 02:03:00,736 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 02:03:16,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=395854.6666666667, ans=0.1 2024-09-24 02:03:18,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=395854.6666666667, ans=0.1 2024-09-24 02:04:06,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=395994.6666666667, ans=0.125 2024-09-24 02:04:16,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=395994.6666666667, ans=0.0 2024-09-24 02:04:16,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=395994.6666666667, ans=0.125 2024-09-24 02:04:21,132 INFO [train.py:1198] (3/4) Epoch 22, batch 3050, loss[loss=0.2094, ctc_loss=0.1418, cr_loss=0.3378, over 17354.00 frames. ], tot_loss[loss=0.2118, ctc_loss=0.1405, cr_loss=0.3565, over 3354756.75 frames. ], batch size: 48, lr: 5.41e-03, grad_scale: 32.0 2024-09-24 02:04:24,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=396041.3333333333, ans=0.125 2024-09-24 02:04:40,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=396088.0, ans=0.0 2024-09-24 02:04:44,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=396088.0, ans=0.0 2024-09-24 02:04:47,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.80 vs. limit=15.0 2024-09-24 02:04:51,584 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2024-09-24 02:04:54,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=396134.6666666667, ans=0.0 2024-09-24 02:05:10,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=396181.3333333333, ans=0.125 2024-09-24 02:05:15,411 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.242e+02 1.330e+02 1.474e+02 2.506e+02, threshold=2.661e+02, percent-clipped=0.0 2024-09-24 02:05:39,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=396274.6666666667, ans=0.0 2024-09-24 02:05:41,008 INFO [train.py:1198] (3/4) Epoch 22, batch 3100, loss[loss=0.2163, ctc_loss=0.1412, cr_loss=0.3756, over 17306.00 frames. ], tot_loss[loss=0.2122, ctc_loss=0.1408, cr_loss=0.3573, over 3350768.09 frames. ], batch size: 49, lr: 5.41e-03, grad_scale: 32.0 2024-09-24 02:05:44,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=396274.6666666667, ans=0.0 2024-09-24 02:05:50,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=396274.6666666667, ans=0.125 2024-09-24 02:06:01,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=396321.3333333333, ans=0.125 2024-09-24 02:06:26,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=396414.6666666667, ans=0.0 2024-09-24 02:07:01,222 INFO [train.py:1198] (3/4) Epoch 22, batch 3150, loss[loss=0.198, ctc_loss=0.1313, cr_loss=0.3335, over 17285.00 frames. ], tot_loss[loss=0.2105, ctc_loss=0.1395, cr_loss=0.355, over 3348284.79 frames. ], batch size: 46, lr: 5.41e-03, grad_scale: 16.0 2024-09-24 02:07:10,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=396508.0, ans=0.0 2024-09-24 02:07:21,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=396554.6666666667, ans=0.125 2024-09-24 02:07:57,788 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.290e+02 1.399e+02 1.555e+02 2.844e+02, threshold=2.797e+02, percent-clipped=1.0 2024-09-24 02:08:15,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.20 vs. limit=10.0 2024-09-24 02:08:19,588 INFO [train.py:1198] (3/4) Epoch 22, batch 3200, loss[loss=0.2114, ctc_loss=0.1428, cr_loss=0.3427, over 17167.00 frames. ], tot_loss[loss=0.2102, ctc_loss=0.1393, cr_loss=0.3546, over 3362462.17 frames. ], batch size: 45, lr: 5.41e-03, grad_scale: 32.0 2024-09-24 02:09:37,799 INFO [train.py:1198] (3/4) Epoch 22, batch 3250, loss[loss=0.1886, ctc_loss=0.1211, cr_loss=0.3376, over 17005.00 frames. ], tot_loss[loss=0.2117, ctc_loss=0.1403, cr_loss=0.3569, over 3359258.59 frames. ], batch size: 44, lr: 5.41e-03, grad_scale: 32.0 2024-09-24 02:09:49,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=396974.6666666667, ans=0.2 2024-09-24 02:10:17,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=397068.0, ans=0.125 2024-09-24 02:10:33,683 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.276e+02 1.353e+02 1.572e+02 3.957e+02, threshold=2.706e+02, percent-clipped=1.0 2024-09-24 02:10:44,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=397161.3333333333, ans=0.125 2024-09-24 02:10:47,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=397161.3333333333, ans=0.125 2024-09-24 02:10:55,248 INFO [train.py:1198] (3/4) Epoch 22, batch 3300, loss[loss=0.179, ctc_loss=0.1156, cr_loss=0.3167, over 17200.00 frames. ], tot_loss[loss=0.2114, ctc_loss=0.1401, cr_loss=0.3568, over 3362956.33 frames. ], batch size: 47, lr: 5.41e-03, grad_scale: 32.0 2024-09-24 02:11:07,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=397208.0, ans=0.125 2024-09-24 02:11:14,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=397254.6666666667, ans=0.0 2024-09-24 02:11:27,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=397301.3333333333, ans=0.07 2024-09-24 02:11:44,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=397348.0, ans=0.1 2024-09-24 02:11:47,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=397348.0, ans=0.0 2024-09-24 02:11:56,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=397348.0, ans=0.0 2024-09-24 02:12:15,302 INFO [train.py:1198] (3/4) Epoch 22, batch 3350, loss[loss=0.1977, ctc_loss=0.1296, cr_loss=0.3402, over 17257.00 frames. ], tot_loss[loss=0.2113, ctc_loss=0.14, cr_loss=0.3564, over 3352981.98 frames. ], batch size: 44, lr: 5.40e-03, grad_scale: 32.0 2024-09-24 02:12:26,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=397441.3333333333, ans=0.025 2024-09-24 02:12:39,286 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 02:13:02,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=397581.3333333333, ans=0.125 2024-09-24 02:13:11,463 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.283e+02 1.436e+02 1.658e+02 2.229e+02, threshold=2.872e+02, percent-clipped=0.0 2024-09-24 02:13:13,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=397581.3333333333, ans=0.125 2024-09-24 02:13:24,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=397628.0, ans=0.2 2024-09-24 02:13:33,110 INFO [train.py:1198] (3/4) Epoch 22, batch 3400, loss[loss=0.2287, ctc_loss=0.1537, cr_loss=0.3747, over 17220.00 frames. ], tot_loss[loss=0.2112, ctc_loss=0.1398, cr_loss=0.3568, over 3364174.56 frames. ], batch size: 55, lr: 5.40e-03, grad_scale: 32.0 2024-09-24 02:13:33,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=397674.6666666667, ans=0.0 2024-09-24 02:13:35,093 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 02:13:41,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=397674.6666666667, ans=0.0 2024-09-24 02:13:41,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=397674.6666666667, ans=0.025 2024-09-24 02:13:44,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=397674.6666666667, ans=0.1 2024-09-24 02:13:50,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=397721.3333333333, ans=0.0 2024-09-24 02:13:53,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=397721.3333333333, ans=0.025 2024-09-24 02:14:00,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=397721.3333333333, ans=0.0 2024-09-24 02:14:07,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=397768.0, ans=0.125 2024-09-24 02:14:18,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=397814.6666666667, ans=0.125 2024-09-24 02:14:24,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=397814.6666666667, ans=0.2 2024-09-24 02:14:43,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=397861.3333333333, ans=0.125 2024-09-24 02:14:46,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=397861.3333333333, ans=0.025 2024-09-24 02:14:53,443 INFO [train.py:1198] (3/4) Epoch 22, batch 3450, loss[loss=0.2023, ctc_loss=0.1324, cr_loss=0.3494, over 17112.00 frames. ], tot_loss[loss=0.2113, ctc_loss=0.1399, cr_loss=0.3571, over 3360319.30 frames. ], batch size: 49, lr: 5.40e-03, grad_scale: 32.0 2024-09-24 02:15:37,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=398001.3333333333, ans=0.1 2024-09-24 02:15:50,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=398048.0, ans=0.1 2024-09-24 02:15:51,974 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.275e+02 1.378e+02 1.514e+02 2.011e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-24 02:15:55,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=398048.0, ans=0.125 2024-09-24 02:16:14,003 INFO [train.py:1198] (3/4) Epoch 22, batch 3500, loss[loss=0.2397, ctc_loss=0.1607, cr_loss=0.3952, over 16525.00 frames. ], tot_loss[loss=0.2116, ctc_loss=0.1401, cr_loss=0.3572, over 3359303.04 frames. ], batch size: 66, lr: 5.40e-03, grad_scale: 32.0 2024-09-24 02:16:14,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.64 vs. limit=10.0 2024-09-24 02:16:20,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=398141.3333333333, ans=0.0 2024-09-24 02:16:20,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=398141.3333333333, ans=0.125 2024-09-24 02:16:34,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=398188.0, ans=0.125 2024-09-24 02:16:37,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=398188.0, ans=0.2 2024-09-24 02:16:59,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=398281.3333333333, ans=0.1 2024-09-24 02:17:23,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=398328.0, ans=0.0 2024-09-24 02:17:33,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=398374.6666666667, ans=0.125 2024-09-24 02:17:34,215 INFO [train.py:1198] (3/4) Epoch 22, batch 3550, loss[loss=0.1779, ctc_loss=0.1149, cr_loss=0.3152, over 17011.00 frames. ], tot_loss[loss=0.2109, ctc_loss=0.1396, cr_loss=0.3567, over 3362023.55 frames. ], batch size: 44, lr: 5.40e-03, grad_scale: 32.0 2024-09-24 02:17:56,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=398421.3333333333, ans=0.0 2024-09-24 02:18:08,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=398468.0, ans=0.125 2024-09-24 02:18:11,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=398468.0, ans=10.0 2024-09-24 02:18:32,111 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.257e+02 1.363e+02 1.494e+02 1.950e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-24 02:18:52,336 INFO [train.py:1198] (3/4) Epoch 22, batch 3600, loss[loss=0.244, ctc_loss=0.1719, cr_loss=0.3603, over 12222.00 frames. ], tot_loss[loss=0.2112, ctc_loss=0.1399, cr_loss=0.3566, over 3347030.93 frames. ], batch size: 123, lr: 5.40e-03, grad_scale: 32.0 2024-09-24 02:19:04,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=398608.0, ans=0.125 2024-09-24 02:19:22,693 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=12.0 2024-09-24 02:19:26,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=398701.3333333333, ans=0.125 2024-09-24 02:19:30,541 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.58 vs. limit=22.5 2024-09-24 02:19:39,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=398748.0, ans=0.125 2024-09-24 02:19:40,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=398748.0, ans=0.125 2024-09-24 02:19:42,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=398748.0, ans=0.125 2024-09-24 02:19:45,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=398748.0, ans=0.2 2024-09-24 02:19:51,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=398748.0, ans=0.0 2024-09-24 02:20:02,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=398794.6666666667, ans=0.0 2024-09-24 02:20:10,107 INFO [train.py:1198] (3/4) Epoch 22, batch 3650, loss[loss=0.1987, ctc_loss=0.1321, cr_loss=0.333, over 17081.00 frames. ], tot_loss[loss=0.2103, ctc_loss=0.1393, cr_loss=0.3553, over 3350840.04 frames. ], batch size: 46, lr: 5.39e-03, grad_scale: 32.0 2024-09-24 02:20:30,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=398888.0, ans=0.0 2024-09-24 02:20:49,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=398934.6666666667, ans=0.1 2024-09-24 02:20:56,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=22.5 2024-09-24 02:21:02,939 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2024-09-24 02:21:09,557 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.259e+02 1.359e+02 1.456e+02 2.035e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-24 02:21:13,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=399028.0, ans=0.125 2024-09-24 02:21:30,493 INFO [train.py:1198] (3/4) Epoch 22, batch 3700, loss[loss=0.1727, ctc_loss=0.1126, cr_loss=0.3007, over 17192.00 frames. ], tot_loss[loss=0.2103, ctc_loss=0.1392, cr_loss=0.3557, over 3360333.69 frames. ], batch size: 41, lr: 5.39e-03, grad_scale: 32.0 2024-09-24 02:21:41,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=399074.6666666667, ans=0.125 2024-09-24 02:22:04,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=399168.0, ans=0.125 2024-09-24 02:22:23,982 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2024-09-24 02:22:26,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=399214.6666666667, ans=0.125 2024-09-24 02:22:27,173 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2024-09-24 02:22:30,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.27 vs. limit=15.0 2024-09-24 02:22:33,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=399261.3333333333, ans=0.025 2024-09-24 02:22:48,077 INFO [train.py:1198] (3/4) Epoch 22, batch 3750, loss[loss=0.2141, ctc_loss=0.1388, cr_loss=0.3766, over 17007.00 frames. ], tot_loss[loss=0.2105, ctc_loss=0.1393, cr_loss=0.356, over 3358314.81 frames. ], batch size: 51, lr: 5.39e-03, grad_scale: 32.0 2024-09-24 02:22:52,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=399308.0, ans=0.0 2024-09-24 02:23:40,693 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.32 vs. limit=15.0 2024-09-24 02:23:46,154 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.265e+02 1.370e+02 1.488e+02 1.870e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-24 02:23:46,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=399448.0, ans=0.125 2024-09-24 02:24:07,237 INFO [train.py:1198] (3/4) Epoch 22, batch 3800, loss[loss=0.2024, ctc_loss=0.1343, cr_loss=0.3402, over 17012.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.1411, cr_loss=0.3588, over 3327933.60 frames. ], batch size: 44, lr: 5.39e-03, grad_scale: 32.0 2024-09-24 02:24:07,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=399541.3333333333, ans=0.125 2024-09-24 02:24:08,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=22.5 2024-09-24 02:24:08,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=399541.3333333333, ans=0.0 2024-09-24 02:24:21,626 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-09-24 02:24:35,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=12.0 2024-09-24 02:24:36,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=399634.6666666667, ans=0.125 2024-09-24 02:24:42,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=399634.6666666667, ans=0.0 2024-09-24 02:25:23,719 INFO [train.py:1198] (3/4) Epoch 22, batch 3850, loss[loss=0.2027, ctc_loss=0.1348, cr_loss=0.3398, over 17002.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1438, cr_loss=0.3619, over 3289655.98 frames. ], batch size: 44, lr: 5.39e-03, grad_scale: 16.0 2024-09-24 02:25:37,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.31 vs. limit=6.0 2024-09-24 02:26:05,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=399868.0, ans=0.0 2024-09-24 02:26:16,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=399914.6666666667, ans=0.125 2024-09-24 02:26:22,073 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.394e+02 1.521e+02 1.653e+02 2.855e+02, threshold=3.042e+02, percent-clipped=1.0 2024-09-24 02:26:23,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=399961.3333333333, ans=0.125 2024-09-24 02:27:26,220 INFO [train.py:1198] (3/4) Epoch 23, batch 0, loss[loss=0.2489, ctc_loss=0.176, cr_loss=0.3645, over 11589.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.176, cr_loss=0.3645, over 11589.00 frames. ], batch size: 124, lr: 5.27e-03, grad_scale: 32.0 2024-09-24 02:27:26,221 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 02:27:34,230 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.5219, 4.7753, 5.3257, 5.1562], device='cuda:3') 2024-09-24 02:27:41,786 INFO [train.py:1230] (3/4) Epoch 23, validation: loss=0.03754, ctc_loss=0.03754, cr_loss=8.311e-15, over 944034.00 frames. 2024-09-24 02:27:41,787 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 02:28:24,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=15.0 2024-09-24 02:28:34,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=15.0 2024-09-24 02:28:56,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=400180.6666666667, ans=0.125 2024-09-24 02:28:57,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=400180.6666666667, ans=6.0 2024-09-24 02:29:06,171 INFO [train.py:1198] (3/4) Epoch 23, batch 50, loss[loss=0.208, ctc_loss=0.1377, cr_loss=0.3512, over 17288.00 frames. ], tot_loss[loss=0.2113, ctc_loss=0.14, cr_loss=0.3562, over 764406.09 frames. ], batch size: 51, lr: 5.26e-03, grad_scale: 32.0 2024-09-24 02:29:12,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=400227.3333333333, ans=0.125 2024-09-24 02:30:08,942 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 02:30:10,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=400414.0, ans=0.0 2024-09-24 02:30:11,721 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.262e+02 1.335e+02 1.476e+02 2.366e+02, threshold=2.670e+02, percent-clipped=0.0 2024-09-24 02:30:12,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=400414.0, ans=0.0 2024-09-24 02:30:26,017 INFO [train.py:1198] (3/4) Epoch 23, batch 100, loss[loss=0.1946, ctc_loss=0.1254, cr_loss=0.3464, over 17163.00 frames. ], tot_loss[loss=0.2118, ctc_loss=0.1405, cr_loss=0.3565, over 1329898.74 frames. ], batch size: 45, lr: 5.26e-03, grad_scale: 32.0 2024-09-24 02:30:56,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=400554.0, ans=0.125 2024-09-24 02:30:57,338 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.55 vs. limit=10.0 2024-09-24 02:31:25,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=400600.6666666667, ans=0.125 2024-09-24 02:31:33,462 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=12.0 2024-09-24 02:31:50,461 INFO [train.py:1198] (3/4) Epoch 23, batch 150, loss[loss=0.1711, ctc_loss=0.113, cr_loss=0.2906, over 17097.00 frames. ], tot_loss[loss=0.2116, ctc_loss=0.1402, cr_loss=0.3574, over 1780484.96 frames. ], batch size: 43, lr: 5.26e-03, grad_scale: 32.0 2024-09-24 02:31:57,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=400694.0, ans=0.0 2024-09-24 02:32:16,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=400740.6666666667, ans=0.125 2024-09-24 02:32:25,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=400787.3333333333, ans=0.125 2024-09-24 02:32:55,955 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.238e+02 1.330e+02 1.442e+02 1.852e+02, threshold=2.660e+02, percent-clipped=0.0 2024-09-24 02:33:13,118 INFO [train.py:1198] (3/4) Epoch 23, batch 200, loss[loss=0.1769, ctc_loss=0.1165, cr_loss=0.3018, over 17272.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1412, cr_loss=0.3593, over 2129501.06 frames. ], batch size: 42, lr: 5.26e-03, grad_scale: 32.0 2024-09-24 02:33:21,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=400927.3333333333, ans=0.0 2024-09-24 02:34:07,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=401067.3333333333, ans=0.1 2024-09-24 02:34:26,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=401114.0, ans=0.1 2024-09-24 02:34:35,732 INFO [train.py:1198] (3/4) Epoch 23, batch 250, loss[loss=0.2122, ctc_loss=0.1411, cr_loss=0.3554, over 16778.00 frames. ], tot_loss[loss=0.2114, ctc_loss=0.1399, cr_loss=0.3577, over 2412409.42 frames. ], batch size: 61, lr: 5.26e-03, grad_scale: 32.0 2024-09-24 02:34:45,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=401160.6666666667, ans=0.2 2024-09-24 02:35:08,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-09-24 02:35:27,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=22.5 2024-09-24 02:35:30,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=401300.6666666667, ans=0.125 2024-09-24 02:35:30,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=401300.6666666667, ans=0.95 2024-09-24 02:35:32,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=12.0 2024-09-24 02:35:40,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=401347.3333333333, ans=0.1 2024-09-24 02:35:41,289 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.253e+02 1.336e+02 1.468e+02 3.243e+02, threshold=2.673e+02, percent-clipped=1.0 2024-09-24 02:35:55,854 INFO [train.py:1198] (3/4) Epoch 23, batch 300, loss[loss=0.2093, ctc_loss=0.1403, cr_loss=0.3452, over 17302.00 frames. ], tot_loss[loss=0.2117, ctc_loss=0.14, cr_loss=0.3581, over 2627770.65 frames. ], batch size: 49, lr: 5.26e-03, grad_scale: 32.0 2024-09-24 02:36:04,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=401394.0, ans=0.09899494936611666 2024-09-24 02:36:15,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=401440.6666666667, ans=0.0 2024-09-24 02:36:18,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=401440.6666666667, ans=22.5 2024-09-24 02:36:18,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.39 vs. limit=22.5 2024-09-24 02:36:29,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=401440.6666666667, ans=0.2 2024-09-24 02:37:00,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=401534.0, ans=0.125 2024-09-24 02:37:20,390 INFO [train.py:1198] (3/4) Epoch 23, batch 350, loss[loss=0.2028, ctc_loss=0.1357, cr_loss=0.3355, over 17016.00 frames. ], tot_loss[loss=0.2106, ctc_loss=0.1394, cr_loss=0.3564, over 2791949.12 frames. ], batch size: 44, lr: 5.26e-03, grad_scale: 32.0 2024-09-24 02:37:25,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=401627.3333333333, ans=0.2 2024-09-24 02:38:24,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=22.5 2024-09-24 02:38:28,351 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.260e+02 1.355e+02 1.507e+02 3.262e+02, threshold=2.709e+02, percent-clipped=1.0 2024-09-24 02:38:42,872 INFO [train.py:1198] (3/4) Epoch 23, batch 400, loss[loss=0.1631, ctc_loss=0.1041, cr_loss=0.2952, over 16982.00 frames. ], tot_loss[loss=0.2105, ctc_loss=0.1394, cr_loss=0.3554, over 2904671.86 frames. ], batch size: 42, lr: 5.25e-03, grad_scale: 32.0 2024-09-24 02:39:01,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=401907.3333333333, ans=0.125 2024-09-24 02:39:06,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=22.5 2024-09-24 02:39:52,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=402047.3333333333, ans=0.0 2024-09-24 02:40:04,970 INFO [train.py:1198] (3/4) Epoch 23, batch 450, loss[loss=0.2042, ctc_loss=0.1371, cr_loss=0.3353, over 16486.00 frames. ], tot_loss[loss=0.2107, ctc_loss=0.1396, cr_loss=0.3556, over 3006852.94 frames. ], batch size: 66, lr: 5.25e-03, grad_scale: 32.0 2024-09-24 02:40:08,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=402094.0, ans=0.125 2024-09-24 02:40:24,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2024-09-24 02:40:49,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=402187.3333333333, ans=0.07 2024-09-24 02:40:55,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=402234.0, ans=10.0 2024-09-24 02:41:02,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=402234.0, ans=0.025 2024-09-24 02:41:03,007 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2024-09-24 02:41:13,371 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.257e+02 1.349e+02 1.505e+02 2.195e+02, threshold=2.697e+02, percent-clipped=0.0 2024-09-24 02:41:26,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=402327.3333333333, ans=0.125 2024-09-24 02:41:27,648 INFO [train.py:1198] (3/4) Epoch 23, batch 500, loss[loss=0.1944, ctc_loss=0.1266, cr_loss=0.3387, over 17141.00 frames. ], tot_loss[loss=0.2096, ctc_loss=0.1388, cr_loss=0.3544, over 3088381.98 frames. ], batch size: 40, lr: 5.25e-03, grad_scale: 32.0 2024-09-24 02:41:28,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=402327.3333333333, ans=0.0 2024-09-24 02:41:43,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=402374.0, ans=0.0 2024-09-24 02:41:50,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=402374.0, ans=0.125 2024-09-24 02:42:17,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2024-09-24 02:42:20,604 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.94 vs. limit=15.0 2024-09-24 02:42:50,030 INFO [train.py:1198] (3/4) Epoch 23, batch 550, loss[loss=0.1833, ctc_loss=0.1194, cr_loss=0.3195, over 16967.00 frames. ], tot_loss[loss=0.2093, ctc_loss=0.1385, cr_loss=0.3541, over 3143789.13 frames. ], batch size: 42, lr: 5.25e-03, grad_scale: 32.0 2024-09-24 02:42:59,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=402560.6666666667, ans=0.1 2024-09-24 02:43:08,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.32 vs. limit=6.0 2024-09-24 02:43:25,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=402654.0, ans=0.125 2024-09-24 02:43:41,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=402700.6666666667, ans=0.0 2024-09-24 02:43:53,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=402700.6666666667, ans=0.2 2024-09-24 02:43:57,899 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.240e+02 1.347e+02 1.438e+02 1.839e+02, threshold=2.693e+02, percent-clipped=0.0 2024-09-24 02:44:12,302 INFO [train.py:1198] (3/4) Epoch 23, batch 600, loss[loss=0.2014, ctc_loss=0.132, cr_loss=0.3468, over 17226.00 frames. ], tot_loss[loss=0.2089, ctc_loss=0.1383, cr_loss=0.3532, over 3179873.26 frames. ], batch size: 47, lr: 5.25e-03, grad_scale: 32.0 2024-09-24 02:44:14,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=402794.0, ans=15.0 2024-09-24 02:44:20,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=402794.0, ans=0.1 2024-09-24 02:44:28,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=402840.6666666667, ans=0.05 2024-09-24 02:44:52,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=402887.3333333333, ans=0.2 2024-09-24 02:45:12,592 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2024-09-24 02:45:21,932 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.93 vs. limit=10.0 2024-09-24 02:45:32,527 INFO [train.py:1198] (3/4) Epoch 23, batch 650, loss[loss=0.1863, ctc_loss=0.1214, cr_loss=0.3245, over 17014.00 frames. ], tot_loss[loss=0.209, ctc_loss=0.1381, cr_loss=0.3544, over 3228093.15 frames. ], batch size: 44, lr: 5.25e-03, grad_scale: 32.0 2024-09-24 02:46:18,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=403120.6666666667, ans=0.1 2024-09-24 02:46:20,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.96 vs. limit=12.0 2024-09-24 02:46:43,831 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.212e+02 1.314e+02 1.389e+02 1.753e+02, threshold=2.627e+02, percent-clipped=0.0 2024-09-24 02:46:57,958 INFO [train.py:1198] (3/4) Epoch 23, batch 700, loss[loss=0.2152, ctc_loss=0.1429, cr_loss=0.3615, over 17363.00 frames. ], tot_loss[loss=0.2088, ctc_loss=0.138, cr_loss=0.3542, over 3257350.45 frames. ], batch size: 48, lr: 5.24e-03, grad_scale: 32.0 2024-09-24 02:47:04,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=403260.6666666667, ans=0.2 2024-09-24 02:47:14,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=403307.3333333333, ans=0.0 2024-09-24 02:47:20,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=403307.3333333333, ans=0.125 2024-09-24 02:47:23,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=403307.3333333333, ans=0.0 2024-09-24 02:47:25,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=403307.3333333333, ans=0.125 2024-09-24 02:48:16,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=403447.3333333333, ans=0.0 2024-09-24 02:48:20,957 INFO [train.py:1198] (3/4) Epoch 23, batch 750, loss[loss=0.2245, ctc_loss=0.1484, cr_loss=0.3804, over 17211.00 frames. ], tot_loss[loss=0.2092, ctc_loss=0.1384, cr_loss=0.354, over 3274980.45 frames. ], batch size: 50, lr: 5.24e-03, grad_scale: 32.0 2024-09-24 02:48:29,235 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 02:48:38,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.84 vs. limit=10.0 2024-09-24 02:48:54,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=403587.3333333333, ans=0.125 2024-09-24 02:49:04,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.72 vs. limit=15.0 2024-09-24 02:49:11,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=403634.0, ans=0.125 2024-09-24 02:49:14,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=403634.0, ans=0.0 2024-09-24 02:49:19,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=403634.0, ans=0.125 2024-09-24 02:49:19,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=403634.0, ans=0.2 2024-09-24 02:49:19,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=403634.0, ans=0.2 2024-09-24 02:49:30,266 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.305e+02 1.385e+02 1.543e+02 2.308e+02, threshold=2.771e+02, percent-clipped=0.0 2024-09-24 02:49:43,283 INFO [train.py:1198] (3/4) Epoch 23, batch 800, loss[loss=0.2026, ctc_loss=0.1351, cr_loss=0.3377, over 17129.00 frames. ], tot_loss[loss=0.2083, ctc_loss=0.1377, cr_loss=0.3532, over 3299271.89 frames. ], batch size: 48, lr: 5.24e-03, grad_scale: 32.0 2024-09-24 02:49:43,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=403727.3333333333, ans=0.5 2024-09-24 02:50:31,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=403867.3333333333, ans=0.2 2024-09-24 02:50:35,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=403867.3333333333, ans=0.0 2024-09-24 02:50:35,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=403867.3333333333, ans=0.125 2024-09-24 02:50:56,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=22.5 2024-09-24 02:51:08,894 INFO [train.py:1198] (3/4) Epoch 23, batch 850, loss[loss=0.163, ctc_loss=0.109, cr_loss=0.2701, over 17175.00 frames. ], tot_loss[loss=0.2086, ctc_loss=0.1377, cr_loss=0.3542, over 3318540.62 frames. ], batch size: 41, lr: 5.24e-03, grad_scale: 16.0 2024-09-24 02:51:33,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=404007.3333333333, ans=0.125 2024-09-24 02:51:36,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=404007.3333333333, ans=0.2 2024-09-24 02:51:36,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=404007.3333333333, ans=0.125 2024-09-24 02:51:39,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404054.0, ans=0.1 2024-09-24 02:51:41,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=404054.0, ans=0.0 2024-09-24 02:51:55,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=404100.6666666667, ans=0.2 2024-09-24 02:52:00,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404100.6666666667, ans=0.1 2024-09-24 02:52:10,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2024-09-24 02:52:17,547 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.256e+02 1.343e+02 1.446e+02 2.174e+02, threshold=2.686e+02, percent-clipped=0.0 2024-09-24 02:52:28,723 INFO [train.py:1198] (3/4) Epoch 23, batch 900, loss[loss=0.1878, ctc_loss=0.1231, cr_loss=0.3233, over 17090.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1385, cr_loss=0.3551, over 3331252.53 frames. ], batch size: 40, lr: 5.24e-03, grad_scale: 16.0 2024-09-24 02:52:28,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=404194.0, ans=0.2 2024-09-24 02:52:44,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404240.6666666667, ans=0.1 2024-09-24 02:53:01,891 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 02:53:12,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404287.3333333333, ans=0.1 2024-09-24 02:53:28,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=404334.0, ans=0.125 2024-09-24 02:53:40,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=404380.6666666667, ans=0.0 2024-09-24 02:53:44,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=404380.6666666667, ans=0.1 2024-09-24 02:53:47,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-09-24 02:53:53,645 INFO [train.py:1198] (3/4) Epoch 23, batch 950, loss[loss=0.2303, ctc_loss=0.155, cr_loss=0.3763, over 17231.00 frames. ], tot_loss[loss=0.2106, ctc_loss=0.1392, cr_loss=0.357, over 3345808.81 frames. ], batch size: 55, lr: 5.24e-03, grad_scale: 16.0 2024-09-24 02:53:55,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=404427.3333333333, ans=0.0 2024-09-24 02:54:03,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=404427.3333333333, ans=0.2 2024-09-24 02:54:26,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=404520.6666666667, ans=0.025 2024-09-24 02:54:26,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.65 vs. limit=15.0 2024-09-24 02:54:45,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=404567.3333333333, ans=0.035 2024-09-24 02:54:48,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=404567.3333333333, ans=0.125 2024-09-24 02:54:48,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404567.3333333333, ans=0.1 2024-09-24 02:54:54,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=404567.3333333333, ans=0.125 2024-09-24 02:55:02,710 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.281e+02 1.400e+02 1.572e+02 2.113e+02, threshold=2.800e+02, percent-clipped=0.0 2024-09-24 02:55:05,159 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.34 vs. limit=10.0 2024-09-24 02:55:13,697 INFO [train.py:1198] (3/4) Epoch 23, batch 1000, loss[loss=0.2018, ctc_loss=0.1315, cr_loss=0.3515, over 17116.00 frames. ], tot_loss[loss=0.2105, ctc_loss=0.1392, cr_loss=0.3568, over 3352397.28 frames. ], batch size: 49, lr: 5.24e-03, grad_scale: 16.0 2024-09-24 02:55:14,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.47 vs. limit=22.5 2024-09-24 02:55:21,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=404660.6666666667, ans=0.125 2024-09-24 02:55:25,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=404660.6666666667, ans=0.125 2024-09-24 02:55:30,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=15.0 2024-09-24 02:55:33,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=404707.3333333333, ans=0.125 2024-09-24 02:56:07,034 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=12.0 2024-09-24 02:56:11,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=404800.6666666667, ans=0.0 2024-09-24 02:56:14,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=404800.6666666667, ans=0.125 2024-09-24 02:56:31,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2024-09-24 02:56:36,221 INFO [train.py:1198] (3/4) Epoch 23, batch 1050, loss[loss=0.2174, ctc_loss=0.1414, cr_loss=0.3799, over 17225.00 frames. ], tot_loss[loss=0.2115, ctc_loss=0.1399, cr_loss=0.3578, over 3351249.64 frames. ], batch size: 47, lr: 5.23e-03, grad_scale: 16.0 2024-09-24 02:56:46,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=404894.0, ans=0.0 2024-09-24 02:57:00,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=404940.6666666667, ans=0.1 2024-09-24 02:57:12,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=404987.3333333333, ans=0.125 2024-09-24 02:57:13,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2024-09-24 02:57:16,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=404987.3333333333, ans=0.0 2024-09-24 02:57:17,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=404987.3333333333, ans=0.2 2024-09-24 02:57:25,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=405034.0, ans=0.125 2024-09-24 02:57:46,812 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.300e+02 1.378e+02 1.529e+02 2.270e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-24 02:57:57,830 INFO [train.py:1198] (3/4) Epoch 23, batch 1100, loss[loss=0.2249, ctc_loss=0.1483, cr_loss=0.383, over 17152.00 frames. ], tot_loss[loss=0.2115, ctc_loss=0.1399, cr_loss=0.3579, over 3357147.69 frames. ], batch size: 48, lr: 5.23e-03, grad_scale: 16.0 2024-09-24 02:58:11,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2024-09-24 02:58:46,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=405267.3333333333, ans=0.125 2024-09-24 02:58:49,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=405267.3333333333, ans=0.125 2024-09-24 02:58:55,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2024-09-24 02:59:01,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=405267.3333333333, ans=0.5 2024-09-24 02:59:20,148 INFO [train.py:1198] (3/4) Epoch 23, batch 1150, loss[loss=0.2323, ctc_loss=0.1543, cr_loss=0.3898, over 17028.00 frames. ], tot_loss[loss=0.2116, ctc_loss=0.1401, cr_loss=0.3575, over 3351623.10 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 16.0 2024-09-24 02:59:23,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=405360.6666666667, ans=0.04949747468305833 2024-09-24 02:59:42,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=405407.3333333333, ans=0.0 2024-09-24 02:59:55,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=405454.0, ans=0.125 2024-09-24 02:59:59,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=405454.0, ans=0.2 2024-09-24 03:00:10,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=22.5 2024-09-24 03:00:28,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.82 vs. limit=10.0 2024-09-24 03:00:29,099 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.259e+02 1.339e+02 1.439e+02 1.652e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-24 03:00:40,376 INFO [train.py:1198] (3/4) Epoch 23, batch 1200, loss[loss=0.172, ctc_loss=0.1101, cr_loss=0.3095, over 17084.00 frames. ], tot_loss[loss=0.2108, ctc_loss=0.1396, cr_loss=0.3561, over 3352169.59 frames. ], batch size: 40, lr: 5.23e-03, grad_scale: 32.0 2024-09-24 03:00:42,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=405594.0, ans=0.2 2024-09-24 03:01:04,999 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:01:10,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2024-09-24 03:01:25,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=405687.3333333333, ans=0.2 2024-09-24 03:01:38,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=405734.0, ans=0.125 2024-09-24 03:01:38,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=405734.0, ans=0.125 2024-09-24 03:02:05,558 INFO [train.py:1198] (3/4) Epoch 23, batch 1250, loss[loss=0.2272, ctc_loss=0.1534, cr_loss=0.3689, over 17023.00 frames. ], tot_loss[loss=0.2107, ctc_loss=0.1396, cr_loss=0.3556, over 3354462.21 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 32.0 2024-09-24 03:02:12,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.27 vs. limit=15.0 2024-09-24 03:02:21,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=405874.0, ans=0.125 2024-09-24 03:02:23,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=405874.0, ans=0.05 2024-09-24 03:02:26,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=405874.0, ans=0.0 2024-09-24 03:02:46,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-09-24 03:02:46,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405920.6666666667, ans=0.1 2024-09-24 03:02:49,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=405920.6666666667, ans=0.2 2024-09-24 03:02:53,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=405920.6666666667, ans=0.2 2024-09-24 03:02:57,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-09-24 03:03:18,869 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.257e+02 1.354e+02 1.461e+02 2.818e+02, threshold=2.708e+02, percent-clipped=1.0 2024-09-24 03:03:30,976 INFO [train.py:1198] (3/4) Epoch 23, batch 1300, loss[loss=0.2226, ctc_loss=0.1469, cr_loss=0.3782, over 17036.00 frames. ], tot_loss[loss=0.2115, ctc_loss=0.1402, cr_loss=0.3568, over 3362358.67 frames. ], batch size: 52, lr: 5.23e-03, grad_scale: 16.0 2024-09-24 03:03:58,090 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:04:11,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=406154.0, ans=0.0 2024-09-24 03:04:39,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=406247.3333333333, ans=0.2 2024-09-24 03:04:41,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=406247.3333333333, ans=0.0 2024-09-24 03:04:50,624 INFO [train.py:1198] (3/4) Epoch 23, batch 1350, loss[loss=0.2437, ctc_loss=0.1653, cr_loss=0.3916, over 16187.00 frames. ], tot_loss[loss=0.2111, ctc_loss=0.1398, cr_loss=0.3563, over 3362287.68 frames. ], batch size: 74, lr: 5.23e-03, grad_scale: 8.0 2024-09-24 03:05:09,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=406340.6666666667, ans=0.2 2024-09-24 03:05:23,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.06 vs. limit=10.0 2024-09-24 03:05:59,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-24 03:06:07,398 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.244e+02 1.338e+02 1.449e+02 2.733e+02, threshold=2.676e+02, percent-clipped=1.0 2024-09-24 03:06:09,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=406480.6666666667, ans=0.1 2024-09-24 03:06:15,517 INFO [train.py:1198] (3/4) Epoch 23, batch 1400, loss[loss=0.2279, ctc_loss=0.1517, cr_loss=0.381, over 17303.00 frames. ], tot_loss[loss=0.2116, ctc_loss=0.1403, cr_loss=0.3567, over 3356797.23 frames. ], batch size: 49, lr: 5.22e-03, grad_scale: 8.0 2024-09-24 03:06:49,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=406620.6666666667, ans=0.1 2024-09-24 03:07:16,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=406667.3333333333, ans=0.125 2024-09-24 03:07:27,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=406714.0, ans=0.0 2024-09-24 03:07:30,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=406714.0, ans=0.125 2024-09-24 03:07:35,431 INFO [train.py:1198] (3/4) Epoch 23, batch 1450, loss[loss=0.2303, ctc_loss=0.1506, cr_loss=0.3987, over 17004.00 frames. ], tot_loss[loss=0.2102, ctc_loss=0.1392, cr_loss=0.3551, over 3355726.24 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 8.0 2024-09-24 03:08:02,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2024-09-24 03:08:10,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2024-09-24 03:08:19,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=406854.0, ans=0.125 2024-09-24 03:08:41,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=406900.6666666667, ans=0.1 2024-09-24 03:08:51,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=406947.3333333333, ans=0.0 2024-09-24 03:08:52,581 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.227e+02 1.307e+02 1.393e+02 1.809e+02, threshold=2.613e+02, percent-clipped=0.0 2024-09-24 03:09:00,496 INFO [train.py:1198] (3/4) Epoch 23, batch 1500, loss[loss=0.2317, ctc_loss=0.1588, cr_loss=0.3644, over 15072.00 frames. ], tot_loss[loss=0.2105, ctc_loss=0.1394, cr_loss=0.3553, over 3346687.14 frames. ], batch size: 88, lr: 5.22e-03, grad_scale: 8.0 2024-09-24 03:09:03,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=406994.0, ans=0.015 2024-09-24 03:09:17,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2024-09-24 03:09:37,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=407087.3333333333, ans=0.025 2024-09-24 03:09:39,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=407087.3333333333, ans=0.125 2024-09-24 03:10:12,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=407180.6666666667, ans=0.2 2024-09-24 03:10:20,387 INFO [train.py:1198] (3/4) Epoch 23, batch 1550, loss[loss=0.1826, ctc_loss=0.1179, cr_loss=0.3234, over 17223.00 frames. ], tot_loss[loss=0.2096, ctc_loss=0.1388, cr_loss=0.3543, over 3359799.76 frames. ], batch size: 47, lr: 5.22e-03, grad_scale: 8.0 2024-09-24 03:10:35,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=407274.0, ans=0.125 2024-09-24 03:10:46,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=407274.0, ans=0.125 2024-09-24 03:11:12,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=407367.3333333333, ans=0.2 2024-09-24 03:11:37,268 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.292e+02 1.390e+02 1.535e+02 2.050e+02, threshold=2.781e+02, percent-clipped=0.0 2024-09-24 03:11:45,286 INFO [train.py:1198] (3/4) Epoch 23, batch 1600, loss[loss=0.2405, ctc_loss=0.1613, cr_loss=0.3963, over 17370.00 frames. ], tot_loss[loss=0.2098, ctc_loss=0.1388, cr_loss=0.3547, over 3367787.85 frames. ], batch size: 48, lr: 5.22e-03, grad_scale: 16.0 2024-09-24 03:11:47,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2024-09-24 03:11:48,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=407460.6666666667, ans=0.125 2024-09-24 03:12:11,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2024-09-24 03:12:15,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.74 vs. limit=15.0 2024-09-24 03:12:19,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=407554.0, ans=0.0 2024-09-24 03:12:25,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=407554.0, ans=0.2 2024-09-24 03:12:38,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=407600.6666666667, ans=0.07 2024-09-24 03:12:55,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=407647.3333333333, ans=0.0 2024-09-24 03:13:08,010 INFO [train.py:1198] (3/4) Epoch 23, batch 1650, loss[loss=0.1927, ctc_loss=0.1252, cr_loss=0.3372, over 17216.00 frames. ], tot_loss[loss=0.2107, ctc_loss=0.1394, cr_loss=0.3564, over 3373861.79 frames. ], batch size: 41, lr: 5.22e-03, grad_scale: 16.0 2024-09-24 03:13:09,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=407694.0, ans=0.2 2024-09-24 03:13:36,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=407740.6666666667, ans=0.125 2024-09-24 03:13:38,252 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-24 03:13:39,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=407740.6666666667, ans=0.2 2024-09-24 03:13:40,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=407787.3333333333, ans=0.04949747468305833 2024-09-24 03:14:21,925 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.251e+02 1.320e+02 1.451e+02 2.604e+02, threshold=2.640e+02, percent-clipped=0.0 2024-09-24 03:14:29,903 INFO [train.py:1198] (3/4) Epoch 23, batch 1700, loss[loss=0.2452, ctc_loss=0.1646, cr_loss=0.4027, over 17009.00 frames. ], tot_loss[loss=0.2103, ctc_loss=0.1391, cr_loss=0.356, over 3375782.80 frames. ], batch size: 53, lr: 5.22e-03, grad_scale: 16.0 2024-09-24 03:15:17,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=408067.3333333333, ans=0.0 2024-09-24 03:15:34,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=408114.0, ans=0.125 2024-09-24 03:15:46,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=408114.0, ans=0.025 2024-09-24 03:15:52,358 INFO [train.py:1198] (3/4) Epoch 23, batch 1750, loss[loss=0.1935, ctc_loss=0.1244, cr_loss=0.3455, over 17089.00 frames. ], tot_loss[loss=0.2098, ctc_loss=0.1387, cr_loss=0.3554, over 3371351.84 frames. ], batch size: 43, lr: 5.21e-03, grad_scale: 16.0 2024-09-24 03:16:27,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=22.5 2024-09-24 03:16:46,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=408300.6666666667, ans=0.1 2024-09-24 03:16:57,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=408347.3333333333, ans=0.125 2024-09-24 03:17:06,659 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.263e+02 1.353e+02 1.472e+02 2.567e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-24 03:17:14,534 INFO [train.py:1198] (3/4) Epoch 23, batch 1800, loss[loss=0.2293, ctc_loss=0.1549, cr_loss=0.3725, over 17098.00 frames. ], tot_loss[loss=0.2105, ctc_loss=0.1392, cr_loss=0.3564, over 3375545.05 frames. ], batch size: 49, lr: 5.21e-03, grad_scale: 16.0 2024-09-24 03:17:14,967 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:17:24,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=408394.0, ans=0.125 2024-09-24 03:17:29,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=408440.6666666667, ans=0.125 2024-09-24 03:17:49,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=408487.3333333333, ans=0.0 2024-09-24 03:18:39,950 INFO [train.py:1198] (3/4) Epoch 23, batch 1850, loss[loss=0.2051, ctc_loss=0.1338, cr_loss=0.3562, over 17044.00 frames. ], tot_loss[loss=0.211, ctc_loss=0.1396, cr_loss=0.357, over 3366190.79 frames. ], batch size: 39, lr: 5.21e-03, grad_scale: 16.0 2024-09-24 03:19:32,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=408767.3333333333, ans=0.0 2024-09-24 03:19:44,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=408814.0, ans=0.125 2024-09-24 03:19:51,998 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.253e+02 1.335e+02 1.430e+02 2.025e+02, threshold=2.670e+02, percent-clipped=0.0 2024-09-24 03:19:55,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=408814.0, ans=0.125 2024-09-24 03:19:55,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=408814.0, ans=0.125 2024-09-24 03:19:59,999 INFO [train.py:1198] (3/4) Epoch 23, batch 1900, loss[loss=0.1886, ctc_loss=0.122, cr_loss=0.3331, over 17226.00 frames. ], tot_loss[loss=0.212, ctc_loss=0.1404, cr_loss=0.3578, over 3349568.20 frames. ], batch size: 47, lr: 5.21e-03, grad_scale: 16.0 2024-09-24 03:20:19,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=12.0 2024-09-24 03:20:54,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.02 vs. limit=15.0 2024-09-24 03:21:25,582 INFO [train.py:1198] (3/4) Epoch 23, batch 1950, loss[loss=0.2383, ctc_loss=0.1586, cr_loss=0.3983, over 16497.00 frames. ], tot_loss[loss=0.2114, ctc_loss=0.14, cr_loss=0.357, over 3353567.24 frames. ], batch size: 66, lr: 5.21e-03, grad_scale: 16.0 2024-09-24 03:21:40,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=409140.6666666667, ans=0.0 2024-09-24 03:22:23,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=409234.0, ans=0.125 2024-09-24 03:22:40,199 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.272e+02 1.356e+02 1.502e+02 2.746e+02, threshold=2.712e+02, percent-clipped=1.0 2024-09-24 03:22:48,042 INFO [train.py:1198] (3/4) Epoch 23, batch 2000, loss[loss=0.1702, ctc_loss=0.1091, cr_loss=0.3054, over 16920.00 frames. ], tot_loss[loss=0.2111, ctc_loss=0.1398, cr_loss=0.3565, over 3355464.80 frames. ], batch size: 42, lr: 5.21e-03, grad_scale: 32.0 2024-09-24 03:23:14,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=409374.0, ans=0.07 2024-09-24 03:23:45,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=409467.3333333333, ans=0.125 2024-09-24 03:24:06,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=409514.0, ans=0.025 2024-09-24 03:24:10,502 INFO [train.py:1198] (3/4) Epoch 23, batch 2050, loss[loss=0.2112, ctc_loss=0.1454, cr_loss=0.3292, over 16722.00 frames. ], tot_loss[loss=0.2111, ctc_loss=0.1399, cr_loss=0.3564, over 3353894.30 frames. ], batch size: 61, lr: 5.20e-03, grad_scale: 16.0 2024-09-24 03:24:57,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=409700.6666666667, ans=0.07 2024-09-24 03:25:00,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=409700.6666666667, ans=0.2 2024-09-24 03:25:24,758 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.284e+02 1.371e+02 1.476e+02 1.863e+02, threshold=2.742e+02, percent-clipped=0.0 2024-09-24 03:25:31,152 INFO [train.py:1198] (3/4) Epoch 23, batch 2100, loss[loss=0.2315, ctc_loss=0.1557, cr_loss=0.3791, over 17056.00 frames. ], tot_loss[loss=0.2113, ctc_loss=0.14, cr_loss=0.3565, over 3336703.07 frames. ], batch size: 52, lr: 5.20e-03, grad_scale: 16.0 2024-09-24 03:25:37,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2024-09-24 03:25:51,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=409840.6666666667, ans=0.1 2024-09-24 03:25:53,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=409840.6666666667, ans=0.125 2024-09-24 03:26:08,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2024-09-24 03:26:33,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=409934.0, ans=0.125 2024-09-24 03:26:56,011 INFO [train.py:1198] (3/4) Epoch 23, batch 2150, loss[loss=0.2124, ctc_loss=0.1432, cr_loss=0.3462, over 17110.00 frames. ], tot_loss[loss=0.2109, ctc_loss=0.1396, cr_loss=0.3565, over 3343751.92 frames. ], batch size: 49, lr: 5.20e-03, grad_scale: 16.0 2024-09-24 03:27:17,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=410074.0, ans=0.0 2024-09-24 03:27:20,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=410074.0, ans=0.125 2024-09-24 03:27:25,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=410074.0, ans=0.2 2024-09-24 03:27:34,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=410120.6666666667, ans=0.2 2024-09-24 03:27:46,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.96 vs. limit=10.0 2024-09-24 03:27:49,517 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.07 vs. limit=15.0 2024-09-24 03:27:58,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=410167.3333333333, ans=0.0 2024-09-24 03:28:00,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=410167.3333333333, ans=0.0 2024-09-24 03:28:15,214 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.254e+02 1.321e+02 1.428e+02 2.805e+02, threshold=2.642e+02, percent-clipped=1.0 2024-09-24 03:28:21,745 INFO [train.py:1198] (3/4) Epoch 23, batch 2200, loss[loss=0.2212, ctc_loss=0.1446, cr_loss=0.3826, over 17305.00 frames. ], tot_loss[loss=0.2114, ctc_loss=0.14, cr_loss=0.3567, over 3335125.53 frames. ], batch size: 46, lr: 5.20e-03, grad_scale: 16.0 2024-09-24 03:28:25,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=410260.6666666667, ans=0.0 2024-09-24 03:28:42,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=410307.3333333333, ans=0.0 2024-09-24 03:29:07,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2024-09-24 03:29:41,641 INFO [train.py:1198] (3/4) Epoch 23, batch 2250, loss[loss=0.2099, ctc_loss=0.1386, cr_loss=0.3566, over 17299.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1387, cr_loss=0.3543, over 3338499.50 frames. ], batch size: 46, lr: 5.20e-03, grad_scale: 16.0 2024-09-24 03:29:49,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=410494.0, ans=0.125 2024-09-24 03:30:15,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=410587.3333333333, ans=0.125 2024-09-24 03:30:16,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2024-09-24 03:30:19,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2024-09-24 03:30:30,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=410634.0, ans=0.0 2024-09-24 03:30:47,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=410634.0, ans=0.025 2024-09-24 03:31:02,825 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.239e+02 1.305e+02 1.382e+02 2.270e+02, threshold=2.609e+02, percent-clipped=0.0 2024-09-24 03:31:09,444 INFO [train.py:1198] (3/4) Epoch 23, batch 2300, loss[loss=0.2079, ctc_loss=0.1368, cr_loss=0.3555, over 17306.00 frames. ], tot_loss[loss=0.2097, ctc_loss=0.1388, cr_loss=0.3544, over 3351891.98 frames. ], batch size: 49, lr: 5.20e-03, grad_scale: 16.0 2024-09-24 03:31:21,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.60 vs. limit=15.0 2024-09-24 03:31:45,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2024-09-24 03:31:46,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410820.6666666667, ans=0.1 2024-09-24 03:31:48,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=410820.6666666667, ans=0.025 2024-09-24 03:32:08,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=410867.3333333333, ans=0.125 2024-09-24 03:32:31,674 INFO [train.py:1198] (3/4) Epoch 23, batch 2350, loss[loss=0.1976, ctc_loss=0.1306, cr_loss=0.3347, over 17225.00 frames. ], tot_loss[loss=0.2092, ctc_loss=0.1385, cr_loss=0.3537, over 3345545.12 frames. ], batch size: 47, lr: 5.20e-03, grad_scale: 16.0 2024-09-24 03:32:33,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=410960.6666666667, ans=0.1 2024-09-24 03:32:48,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411007.3333333333, ans=0.1 2024-09-24 03:33:09,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=22.5 2024-09-24 03:33:15,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=411054.0, ans=0.125 2024-09-24 03:33:25,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=411100.6666666667, ans=0.1 2024-09-24 03:33:30,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=411100.6666666667, ans=10.0 2024-09-24 03:33:35,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=411100.6666666667, ans=0.0 2024-09-24 03:33:47,828 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.259e+02 1.351e+02 1.489e+02 2.005e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-24 03:33:54,327 INFO [train.py:1198] (3/4) Epoch 23, batch 2400, loss[loss=0.217, ctc_loss=0.1473, cr_loss=0.3483, over 16520.00 frames. ], tot_loss[loss=0.2091, ctc_loss=0.1382, cr_loss=0.3541, over 3346117.86 frames. ], batch size: 66, lr: 5.19e-03, grad_scale: 32.0 2024-09-24 03:33:58,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-09-24 03:34:17,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=411240.6666666667, ans=0.0 2024-09-24 03:34:23,794 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:34:47,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=411334.0, ans=0.05 2024-09-24 03:35:10,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2024-09-24 03:35:14,352 INFO [train.py:1198] (3/4) Epoch 23, batch 2450, loss[loss=0.2112, ctc_loss=0.1397, cr_loss=0.3575, over 17352.00 frames. ], tot_loss[loss=0.2099, ctc_loss=0.1388, cr_loss=0.3553, over 3356955.42 frames. ], batch size: 48, lr: 5.19e-03, grad_scale: 16.0 2024-09-24 03:35:16,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411427.3333333333, ans=0.1 2024-09-24 03:35:17,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=411427.3333333333, ans=0.125 2024-09-24 03:35:44,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=411474.0, ans=0.125 2024-09-24 03:35:52,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=411520.6666666667, ans=0.1 2024-09-24 03:35:57,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=411520.6666666667, ans=0.2 2024-09-24 03:36:17,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=411567.3333333333, ans=0.2 2024-09-24 03:36:20,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=411567.3333333333, ans=0.125 2024-09-24 03:36:22,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=411614.0, ans=0.2 2024-09-24 03:36:35,083 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.268e+02 1.337e+02 1.491e+02 2.179e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-24 03:36:39,831 INFO [train.py:1198] (3/4) Epoch 23, batch 2500, loss[loss=0.1904, ctc_loss=0.1235, cr_loss=0.3343, over 17310.00 frames. ], tot_loss[loss=0.2093, ctc_loss=0.1383, cr_loss=0.355, over 3361621.23 frames. ], batch size: 46, lr: 5.19e-03, grad_scale: 16.0 2024-09-24 03:36:44,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=411660.6666666667, ans=0.125 2024-09-24 03:36:46,986 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=12.0 2024-09-24 03:36:54,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=411707.3333333333, ans=0.125 2024-09-24 03:37:01,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411707.3333333333, ans=0.1 2024-09-24 03:37:07,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=411707.3333333333, ans=0.07 2024-09-24 03:37:15,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=411754.0, ans=0.09899494936611666 2024-09-24 03:37:33,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=411800.6666666667, ans=0.125 2024-09-24 03:37:50,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=411847.3333333333, ans=0.07 2024-09-24 03:37:55,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=411847.3333333333, ans=0.2 2024-09-24 03:37:59,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=411847.3333333333, ans=0.0 2024-09-24 03:38:02,887 INFO [train.py:1198] (3/4) Epoch 23, batch 2550, loss[loss=0.228, ctc_loss=0.1526, cr_loss=0.377, over 17204.00 frames. ], tot_loss[loss=0.2091, ctc_loss=0.1382, cr_loss=0.3546, over 3368196.84 frames. ], batch size: 55, lr: 5.19e-03, grad_scale: 16.0 2024-09-24 03:38:03,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=411894.0, ans=0.125 2024-09-24 03:38:53,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=412034.0, ans=0.0 2024-09-24 03:39:12,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=412080.6666666667, ans=0.1 2024-09-24 03:39:12,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=412080.6666666667, ans=0.2 2024-09-24 03:39:20,141 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.256e+02 1.367e+02 1.488e+02 2.148e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-24 03:39:24,811 INFO [train.py:1198] (3/4) Epoch 23, batch 2600, loss[loss=0.2078, ctc_loss=0.1353, cr_loss=0.3624, over 17212.00 frames. ], tot_loss[loss=0.2093, ctc_loss=0.1383, cr_loss=0.355, over 3370250.09 frames. ], batch size: 47, lr: 5.19e-03, grad_scale: 16.0 2024-09-24 03:39:50,416 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:40:03,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=412220.6666666667, ans=0.125 2024-09-24 03:40:23,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=412267.3333333333, ans=0.125 2024-09-24 03:40:27,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.99 vs. limit=10.0 2024-09-24 03:40:32,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=412314.0, ans=0.1 2024-09-24 03:40:36,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=412314.0, ans=0.0 2024-09-24 03:40:49,869 INFO [train.py:1198] (3/4) Epoch 23, batch 2650, loss[loss=0.1916, ctc_loss=0.1274, cr_loss=0.3205, over 17212.00 frames. ], tot_loss[loss=0.2092, ctc_loss=0.1382, cr_loss=0.3553, over 3366124.27 frames. ], batch size: 50, lr: 5.19e-03, grad_scale: 16.0 2024-09-24 03:40:54,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=412360.6666666667, ans=0.125 2024-09-24 03:41:28,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=412454.0, ans=0.0 2024-09-24 03:41:40,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=412500.6666666667, ans=0.0 2024-09-24 03:42:05,850 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.276e+02 1.370e+02 1.490e+02 2.063e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-24 03:42:10,852 INFO [train.py:1198] (3/4) Epoch 23, batch 2700, loss[loss=0.2221, ctc_loss=0.1487, cr_loss=0.3669, over 16272.00 frames. ], tot_loss[loss=0.2099, ctc_loss=0.1388, cr_loss=0.3555, over 3359305.39 frames. ], batch size: 74, lr: 5.19e-03, grad_scale: 16.0 2024-09-24 03:42:25,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=412594.0, ans=0.09899494936611666 2024-09-24 03:42:37,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=412640.6666666667, ans=0.07 2024-09-24 03:42:38,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.13 vs. limit=6.0 2024-09-24 03:42:54,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=412687.3333333333, ans=0.1 2024-09-24 03:43:04,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=412734.0, ans=0.1 2024-09-24 03:43:08,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2024-09-24 03:43:37,011 INFO [train.py:1198] (3/4) Epoch 23, batch 2750, loss[loss=0.1776, ctc_loss=0.1151, cr_loss=0.3127, over 16839.00 frames. ], tot_loss[loss=0.2092, ctc_loss=0.1382, cr_loss=0.3547, over 3364586.98 frames. ], batch size: 37, lr: 5.18e-03, grad_scale: 16.0 2024-09-24 03:43:59,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.04 vs. limit=22.5 2024-09-24 03:44:20,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=412920.6666666667, ans=0.2 2024-09-24 03:44:28,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=412967.3333333333, ans=0.125 2024-09-24 03:44:46,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=413014.0, ans=0.05 2024-09-24 03:44:49,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2024-09-24 03:44:52,047 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.288e+02 1.407e+02 1.537e+02 4.593e+02, threshold=2.814e+02, percent-clipped=2.0 2024-09-24 03:44:56,677 INFO [train.py:1198] (3/4) Epoch 23, batch 2800, loss[loss=0.2207, ctc_loss=0.148, cr_loss=0.3635, over 17211.00 frames. ], tot_loss[loss=0.2094, ctc_loss=0.1385, cr_loss=0.355, over 3359257.02 frames. ], batch size: 47, lr: 5.18e-03, grad_scale: 32.0 2024-09-24 03:45:09,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2024-09-24 03:45:36,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-09-24 03:45:42,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=413154.0, ans=0.0 2024-09-24 03:46:02,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=413200.6666666667, ans=0.0 2024-09-24 03:46:21,928 INFO [train.py:1198] (3/4) Epoch 23, batch 2850, loss[loss=0.2187, ctc_loss=0.1453, cr_loss=0.3669, over 17307.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1386, cr_loss=0.3546, over 3354965.12 frames. ], batch size: 51, lr: 5.18e-03, grad_scale: 32.0 2024-09-24 03:46:35,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=413294.0, ans=0.1 2024-09-24 03:46:35,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.45 vs. limit=15.0 2024-09-24 03:46:36,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=413340.6666666667, ans=0.125 2024-09-24 03:47:18,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=413434.0, ans=0.09899494936611666 2024-09-24 03:47:30,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=413480.6666666667, ans=0.125 2024-09-24 03:47:39,673 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.282e+02 1.396e+02 1.534e+02 2.289e+02, threshold=2.792e+02, percent-clipped=0.0 2024-09-24 03:47:44,652 INFO [train.py:1198] (3/4) Epoch 23, batch 2900, loss[loss=0.1898, ctc_loss=0.1227, cr_loss=0.3353, over 17175.00 frames. ], tot_loss[loss=0.209, ctc_loss=0.1381, cr_loss=0.3544, over 3354161.91 frames. ], batch size: 45, lr: 5.18e-03, grad_scale: 32.0 2024-09-24 03:48:03,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=413574.0, ans=0.2 2024-09-24 03:48:24,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=413620.6666666667, ans=0.0 2024-09-24 03:48:43,129 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.00 vs. limit=10.0 2024-09-24 03:48:45,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=413667.3333333333, ans=0.2 2024-09-24 03:48:46,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.95 vs. limit=15.0 2024-09-24 03:48:55,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=413714.0, ans=10.0 2024-09-24 03:49:06,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=413760.6666666667, ans=0.0 2024-09-24 03:49:07,440 INFO [train.py:1198] (3/4) Epoch 23, batch 2950, loss[loss=0.2104, ctc_loss=0.1364, cr_loss=0.37, over 17222.00 frames. ], tot_loss[loss=0.2074, ctc_loss=0.1369, cr_loss=0.3524, over 3356105.36 frames. ], batch size: 50, lr: 5.18e-03, grad_scale: 32.0 2024-09-24 03:49:16,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=413760.6666666667, ans=0.125 2024-09-24 03:49:17,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=413760.6666666667, ans=0.125 2024-09-24 03:49:33,945 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:49:42,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=22.5 2024-09-24 03:49:43,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=413854.0, ans=0.125 2024-09-24 03:49:54,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2024-09-24 03:50:02,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=413900.6666666667, ans=0.1 2024-09-24 03:50:05,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=413900.6666666667, ans=0.1 2024-09-24 03:50:22,356 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.031e+02 1.232e+02 1.296e+02 1.402e+02 3.065e+02, threshold=2.592e+02, percent-clipped=1.0 2024-09-24 03:50:27,187 INFO [train.py:1198] (3/4) Epoch 23, batch 3000, loss[loss=0.2366, ctc_loss=0.1597, cr_loss=0.3844, over 15086.00 frames. ], tot_loss[loss=0.2082, ctc_loss=0.1376, cr_loss=0.3531, over 3357478.33 frames. ], batch size: 89, lr: 5.18e-03, grad_scale: 32.0 2024-09-24 03:50:27,187 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 03:50:35,193 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.7099, 2.9663, 3.3197, 3.3959], device='cuda:3') 2024-09-24 03:50:39,266 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.8762, 2.0070, 2.3418, 2.1690, 2.4761, 2.2800, 2.5009, 1.9513], device='cuda:3') 2024-09-24 03:50:42,632 INFO [train.py:1230] (3/4) Epoch 23, validation: loss=0.03816, ctc_loss=0.03816, cr_loss=8.083e-15, over 944034.00 frames. 2024-09-24 03:50:42,633 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 03:51:29,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=414087.3333333333, ans=0.125 2024-09-24 03:51:43,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=414134.0, ans=0.125 2024-09-24 03:51:48,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=414180.6666666667, ans=0.2 2024-09-24 03:52:03,782 INFO [train.py:1198] (3/4) Epoch 23, batch 3050, loss[loss=0.1664, ctc_loss=0.1076, cr_loss=0.2938, over 16973.00 frames. ], tot_loss[loss=0.2087, ctc_loss=0.138, cr_loss=0.3538, over 3358136.17 frames. ], batch size: 42, lr: 5.18e-03, grad_scale: 16.0 2024-09-24 03:52:26,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=414274.0, ans=0.2 2024-09-24 03:53:05,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=414414.0, ans=0.1 2024-09-24 03:53:13,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=414414.0, ans=0.025 2024-09-24 03:53:19,032 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.284e+02 1.405e+02 1.521e+02 2.229e+02, threshold=2.810e+02, percent-clipped=0.0 2024-09-24 03:53:22,207 INFO [train.py:1198] (3/4) Epoch 23, batch 3100, loss[loss=0.2129, ctc_loss=0.1411, cr_loss=0.3587, over 15967.00 frames. ], tot_loss[loss=0.2084, ctc_loss=0.1377, cr_loss=0.3535, over 3362684.69 frames. ], batch size: 74, lr: 5.17e-03, grad_scale: 16.0 2024-09-24 03:53:22,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=414460.6666666667, ans=0.0 2024-09-24 03:53:24,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=414460.6666666667, ans=0.125 2024-09-24 03:53:35,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=414460.6666666667, ans=0.0 2024-09-24 03:53:39,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=414507.3333333333, ans=0.125 2024-09-24 03:53:41,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=414507.3333333333, ans=0.1 2024-09-24 03:53:50,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=414507.3333333333, ans=0.1 2024-09-24 03:53:59,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=414554.0, ans=0.125 2024-09-24 03:54:06,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=414554.0, ans=0.0 2024-09-24 03:54:42,870 INFO [train.py:1198] (3/4) Epoch 23, batch 3150, loss[loss=0.2026, ctc_loss=0.1355, cr_loss=0.3356, over 17297.00 frames. ], tot_loss[loss=0.2086, ctc_loss=0.1379, cr_loss=0.3534, over 3358151.33 frames. ], batch size: 49, lr: 5.17e-03, grad_scale: 16.0 2024-09-24 03:54:52,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.43 vs. limit=15.0 2024-09-24 03:54:53,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2024-09-24 03:54:54,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=414694.0, ans=0.125 2024-09-24 03:55:24,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-24 03:55:32,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=22.5 2024-09-24 03:55:42,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=414834.0, ans=0.2 2024-09-24 03:55:59,846 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.246e+02 1.369e+02 1.500e+02 3.916e+02, threshold=2.738e+02, percent-clipped=1.0 2024-09-24 03:56:03,019 INFO [train.py:1198] (3/4) Epoch 23, batch 3200, loss[loss=0.2111, ctc_loss=0.1378, cr_loss=0.3667, over 17140.00 frames. ], tot_loss[loss=0.208, ctc_loss=0.1374, cr_loss=0.3527, over 3365774.90 frames. ], batch size: 48, lr: 5.17e-03, grad_scale: 32.0 2024-09-24 03:56:20,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=12.0 2024-09-24 03:57:21,131 INFO [train.py:1198] (3/4) Epoch 23, batch 3250, loss[loss=0.2321, ctc_loss=0.1522, cr_loss=0.3992, over 17067.00 frames. ], tot_loss[loss=0.2079, ctc_loss=0.1374, cr_loss=0.3524, over 3369279.59 frames. ], batch size: 46, lr: 5.17e-03, grad_scale: 32.0 2024-09-24 03:57:30,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=415160.6666666667, ans=0.125 2024-09-24 03:57:40,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=6.0 2024-09-24 03:57:57,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.67 vs. limit=10.0 2024-09-24 03:58:12,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=415300.6666666667, ans=0.0 2024-09-24 03:58:14,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2024-09-24 03:58:34,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=415347.3333333333, ans=0.0 2024-09-24 03:58:36,029 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.250e+02 1.334e+02 1.506e+02 2.556e+02, threshold=2.668e+02, percent-clipped=0.0 2024-09-24 03:58:37,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=415394.0, ans=0.125 2024-09-24 03:58:39,221 INFO [train.py:1198] (3/4) Epoch 23, batch 3300, loss[loss=0.1832, ctc_loss=0.1179, cr_loss=0.3261, over 16331.00 frames. ], tot_loss[loss=0.2072, ctc_loss=0.1368, cr_loss=0.3516, over 3376891.36 frames. ], batch size: 36, lr: 5.17e-03, grad_scale: 32.0 2024-09-24 03:58:39,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2024-09-24 03:58:44,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=415394.0, ans=0.025 2024-09-24 03:58:47,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=415394.0, ans=0.125 2024-09-24 03:58:50,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=415394.0, ans=0.0 2024-09-24 03:58:50,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=415394.0, ans=0.1 2024-09-24 03:58:53,687 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:59:01,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=415440.6666666667, ans=0.0 2024-09-24 03:59:09,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=415487.3333333333, ans=0.125 2024-09-24 03:59:30,147 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=22.5 2024-09-24 03:59:42,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=415580.6666666667, ans=0.2 2024-09-24 03:59:54,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=415580.6666666667, ans=0.125 2024-09-24 03:59:55,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=415627.3333333333, ans=0.1 2024-09-24 03:59:55,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=415627.3333333333, ans=0.0 2024-09-24 03:59:57,330 INFO [train.py:1198] (3/4) Epoch 23, batch 3350, loss[loss=0.2275, ctc_loss=0.1505, cr_loss=0.3853, over 16896.00 frames. ], tot_loss[loss=0.2089, ctc_loss=0.1382, cr_loss=0.3538, over 3369667.65 frames. ], batch size: 58, lr: 5.17e-03, grad_scale: 32.0 2024-09-24 04:00:13,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=415674.0, ans=0.125 2024-09-24 04:00:53,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=415767.3333333333, ans=0.0 2024-09-24 04:00:54,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=415767.3333333333, ans=0.1 2024-09-24 04:01:14,622 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.296e+02 1.415e+02 1.543e+02 2.020e+02, threshold=2.829e+02, percent-clipped=0.0 2024-09-24 04:01:16,761 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-09-24 04:01:17,791 INFO [train.py:1198] (3/4) Epoch 23, batch 3400, loss[loss=0.2044, ctc_loss=0.1346, cr_loss=0.3486, over 17144.00 frames. ], tot_loss[loss=0.2102, ctc_loss=0.1391, cr_loss=0.3557, over 3358947.01 frames. ], batch size: 48, lr: 5.17e-03, grad_scale: 32.0 2024-09-24 04:01:24,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=415860.6666666667, ans=0.125 2024-09-24 04:01:31,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=415907.3333333333, ans=0.0 2024-09-24 04:01:44,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=415907.3333333333, ans=0.0 2024-09-24 04:02:32,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2024-09-24 04:02:35,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.88 vs. limit=6.0 2024-09-24 04:02:38,114 INFO [train.py:1198] (3/4) Epoch 23, batch 3450, loss[loss=0.2348, ctc_loss=0.1557, cr_loss=0.3957, over 16991.00 frames. ], tot_loss[loss=0.2096, ctc_loss=0.1387, cr_loss=0.3545, over 3358720.67 frames. ], batch size: 53, lr: 5.16e-03, grad_scale: 32.0 2024-09-24 04:02:42,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.50 vs. limit=15.0 2024-09-24 04:02:43,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=416094.0, ans=0.0 2024-09-24 04:02:54,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=416140.6666666667, ans=0.125 2024-09-24 04:03:14,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=416187.3333333333, ans=0.05 2024-09-24 04:03:39,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-24 04:03:41,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=416280.6666666667, ans=0.125 2024-09-24 04:03:53,801 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.237e+02 1.333e+02 1.427e+02 1.642e+02, threshold=2.667e+02, percent-clipped=0.0 2024-09-24 04:03:57,049 INFO [train.py:1198] (3/4) Epoch 23, batch 3500, loss[loss=0.2381, ctc_loss=0.1611, cr_loss=0.3849, over 16758.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1386, cr_loss=0.3543, over 3363846.54 frames. ], batch size: 61, lr: 5.16e-03, grad_scale: 32.0 2024-09-24 04:04:55,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=416467.3333333333, ans=0.125 2024-09-24 04:04:58,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=416467.3333333333, ans=0.125 2024-09-24 04:05:05,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=416514.0, ans=0.1 2024-09-24 04:05:13,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=22.5 2024-09-24 04:05:16,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=416560.6666666667, ans=0.125 2024-09-24 04:05:17,347 INFO [train.py:1198] (3/4) Epoch 23, batch 3550, loss[loss=0.1972, ctc_loss=0.1284, cr_loss=0.3439, over 16904.00 frames. ], tot_loss[loss=0.2101, ctc_loss=0.139, cr_loss=0.3555, over 3357245.69 frames. ], batch size: 58, lr: 5.16e-03, grad_scale: 32.0 2024-09-24 04:05:29,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.69 vs. limit=22.5 2024-09-24 04:05:38,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=416607.3333333333, ans=0.125 2024-09-24 04:05:41,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=416607.3333333333, ans=0.125 2024-09-24 04:05:47,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=416607.3333333333, ans=0.07 2024-09-24 04:05:50,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=416654.0, ans=0.125 2024-09-24 04:06:08,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=416700.6666666667, ans=0.125 2024-09-24 04:06:08,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=12.0 2024-09-24 04:06:26,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=416747.3333333333, ans=0.0 2024-09-24 04:06:31,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=416747.3333333333, ans=0.2 2024-09-24 04:06:34,283 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.260e+02 1.366e+02 1.469e+02 1.975e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-24 04:06:37,349 INFO [train.py:1198] (3/4) Epoch 23, batch 3600, loss[loss=0.2048, ctc_loss=0.138, cr_loss=0.334, over 17031.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1386, cr_loss=0.3548, over 3359673.40 frames. ], batch size: 56, lr: 5.16e-03, grad_scale: 32.0 2024-09-24 04:06:37,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=416794.0, ans=0.0 2024-09-24 04:06:40,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=416794.0, ans=0.125 2024-09-24 04:07:00,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=416840.6666666667, ans=0.07 2024-09-24 04:07:04,550 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.39 vs. limit=15.0 2024-09-24 04:07:16,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=416887.3333333333, ans=0.1 2024-09-24 04:07:21,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-09-24 04:07:24,305 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:07:27,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=416934.0, ans=0.1 2024-09-24 04:07:32,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=22.5 2024-09-24 04:07:55,451 INFO [train.py:1198] (3/4) Epoch 23, batch 3650, loss[loss=0.1724, ctc_loss=0.1096, cr_loss=0.3137, over 17083.00 frames. ], tot_loss[loss=0.2103, ctc_loss=0.139, cr_loss=0.3561, over 3364081.39 frames. ], batch size: 43, lr: 5.16e-03, grad_scale: 32.0 2024-09-24 04:07:56,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-09-24 04:08:01,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=417027.3333333333, ans=0.125 2024-09-24 04:08:56,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2024-09-24 04:09:11,667 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.268e+02 1.365e+02 1.526e+02 2.043e+02, threshold=2.731e+02, percent-clipped=0.0 2024-09-24 04:09:12,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=12.0 2024-09-24 04:09:14,880 INFO [train.py:1198] (3/4) Epoch 23, batch 3700, loss[loss=0.1921, ctc_loss=0.1251, cr_loss=0.335, over 17001.00 frames. ], tot_loss[loss=0.2094, ctc_loss=0.1384, cr_loss=0.3548, over 3357876.33 frames. ], batch size: 51, lr: 5.16e-03, grad_scale: 32.0 2024-09-24 04:09:29,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-24 04:09:33,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=417307.3333333333, ans=0.2 2024-09-24 04:09:34,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.91 vs. limit=6.0 2024-09-24 04:09:35,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=417307.3333333333, ans=0.125 2024-09-24 04:09:55,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.78 vs. limit=15.0 2024-09-24 04:10:33,641 INFO [train.py:1198] (3/4) Epoch 23, batch 3750, loss[loss=0.1925, ctc_loss=0.1295, cr_loss=0.3149, over 15841.00 frames. ], tot_loss[loss=0.21, ctc_loss=0.1389, cr_loss=0.3558, over 3341922.04 frames. ], batch size: 74, lr: 5.16e-03, grad_scale: 32.0 2024-09-24 04:10:34,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=417494.0, ans=0.125 2024-09-24 04:10:41,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=417494.0, ans=0.025 2024-09-24 04:11:11,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=417587.3333333333, ans=0.05 2024-09-24 04:11:12,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=417587.3333333333, ans=0.2 2024-09-24 04:11:14,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=417587.3333333333, ans=0.025 2024-09-24 04:11:18,517 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.27 vs. limit=22.5 2024-09-24 04:11:22,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=417634.0, ans=0.125 2024-09-24 04:11:28,779 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:11:30,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=417634.0, ans=0.2 2024-09-24 04:11:35,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=417634.0, ans=0.125 2024-09-24 04:11:50,432 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.304e+02 1.374e+02 1.461e+02 1.993e+02, threshold=2.748e+02, percent-clipped=0.0 2024-09-24 04:11:53,569 INFO [train.py:1198] (3/4) Epoch 23, batch 3800, loss[loss=0.2042, ctc_loss=0.1351, cr_loss=0.3452, over 17029.00 frames. ], tot_loss[loss=0.2113, ctc_loss=0.14, cr_loss=0.3568, over 3316068.10 frames. ], batch size: 51, lr: 5.15e-03, grad_scale: 32.0 2024-09-24 04:11:53,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=417727.3333333333, ans=0.2 2024-09-24 04:12:03,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=417727.3333333333, ans=0.125 2024-09-24 04:12:07,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=417774.0, ans=0.035 2024-09-24 04:12:11,819 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=12.0 2024-09-24 04:12:13,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=22.5 2024-09-24 04:12:54,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2024-09-24 04:13:12,147 INFO [train.py:1198] (3/4) Epoch 23, batch 3850, loss[loss=0.2389, ctc_loss=0.166, cr_loss=0.3647, over 11714.00 frames. ], tot_loss[loss=0.2125, ctc_loss=0.141, cr_loss=0.3573, over 3277700.82 frames. ], batch size: 123, lr: 5.15e-03, grad_scale: 32.0 2024-09-24 04:13:16,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=417960.6666666667, ans=0.125 2024-09-24 04:13:22,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=417960.6666666667, ans=0.0 2024-09-24 04:13:24,755 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2024-09-24 04:13:34,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=418007.3333333333, ans=15.0 2024-09-24 04:13:53,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=418054.0, ans=0.125 2024-09-24 04:14:04,444 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:14:05,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=418100.6666666667, ans=0.0 2024-09-24 04:14:15,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=418147.3333333333, ans=0.1 2024-09-24 04:15:15,651 INFO [train.py:1198] (3/4) Epoch 24, batch 0, loss[loss=0.1692, ctc_loss=0.1102, cr_loss=0.2946, over 17039.00 frames. ], tot_loss[loss=0.1692, ctc_loss=0.1102, cr_loss=0.2946, over 17039.00 frames. ], batch size: 39, lr: 5.04e-03, grad_scale: 32.0 2024-09-24 04:15:15,651 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 04:15:33,340 INFO [train.py:1230] (3/4) Epoch 24, validation: loss=0.03789, ctc_loss=0.03789, cr_loss=8.011e-15, over 944034.00 frames. 2024-09-24 04:15:33,341 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 04:15:36,594 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.212e+02 1.389e+02 1.528e+02 1.642e+02 3.495e+02, threshold=3.056e+02, percent-clipped=0.0 2024-09-24 04:15:40,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=418175.3333333333, ans=0.0 2024-09-24 04:15:52,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=418222.0, ans=0.09899494936611666 2024-09-24 04:16:24,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=418315.3333333333, ans=0.125 2024-09-24 04:16:44,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=418362.0, ans=0.0 2024-09-24 04:16:49,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=418362.0, ans=0.125 2024-09-24 04:16:53,367 INFO [train.py:1198] (3/4) Epoch 24, batch 50, loss[loss=0.2109, ctc_loss=0.1383, cr_loss=0.3628, over 17232.00 frames. ], tot_loss[loss=0.2106, ctc_loss=0.1394, cr_loss=0.3561, over 748177.29 frames. ], batch size: 47, lr: 5.04e-03, grad_scale: 32.0 2024-09-24 04:17:00,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=418408.6666666667, ans=0.1 2024-09-24 04:17:00,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=418408.6666666667, ans=0.0 2024-09-24 04:17:07,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=12.0 2024-09-24 04:17:26,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=418502.0, ans=0.0 2024-09-24 04:17:30,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=418502.0, ans=0.025 2024-09-24 04:18:09,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=418595.3333333333, ans=0.125 2024-09-24 04:18:15,657 INFO [train.py:1198] (3/4) Epoch 24, batch 100, loss[loss=0.212, ctc_loss=0.1414, cr_loss=0.3532, over 16964.00 frames. ], tot_loss[loss=0.2074, ctc_loss=0.1369, cr_loss=0.3523, over 1325158.71 frames. ], batch size: 42, lr: 5.04e-03, grad_scale: 32.0 2024-09-24 04:18:18,829 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.264e+02 1.318e+02 1.445e+02 2.140e+02, threshold=2.636e+02, percent-clipped=1.0 2024-09-24 04:18:33,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=418688.6666666667, ans=0.0 2024-09-24 04:18:36,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=418688.6666666667, ans=0.1 2024-09-24 04:19:10,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=418782.0, ans=0.1 2024-09-24 04:19:29,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=418828.6666666667, ans=0.1 2024-09-24 04:19:37,601 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=22.5 2024-09-24 04:19:38,293 INFO [train.py:1198] (3/4) Epoch 24, batch 150, loss[loss=0.2222, ctc_loss=0.1505, cr_loss=0.3585, over 16114.00 frames. ], tot_loss[loss=0.2061, ctc_loss=0.136, cr_loss=0.3508, over 1780232.93 frames. ], batch size: 74, lr: 5.04e-03, grad_scale: 32.0 2024-09-24 04:19:41,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=418875.3333333333, ans=0.1 2024-09-24 04:19:57,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=418922.0, ans=0.1 2024-09-24 04:20:02,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=418922.0, ans=0.05 2024-09-24 04:20:05,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=418922.0, ans=0.0 2024-09-24 04:20:11,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2024-09-24 04:20:12,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=418968.6666666667, ans=0.04949747468305833 2024-09-24 04:20:54,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=419062.0, ans=0.125 2024-09-24 04:20:56,670 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-09-24 04:21:03,917 INFO [train.py:1198] (3/4) Epoch 24, batch 200, loss[loss=0.2017, ctc_loss=0.1333, cr_loss=0.3419, over 16720.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.1365, cr_loss=0.3521, over 2123188.38 frames. ], batch size: 61, lr: 5.04e-03, grad_scale: 32.0 2024-09-24 04:21:07,018 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.247e+02 1.339e+02 1.438e+02 1.930e+02, threshold=2.677e+02, percent-clipped=0.0 2024-09-24 04:21:20,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-09-24 04:21:34,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=419202.0, ans=0.0 2024-09-24 04:21:36,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2024-09-24 04:21:45,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=419202.0, ans=0.125 2024-09-24 04:22:06,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=419248.6666666667, ans=0.125 2024-09-24 04:22:26,695 INFO [train.py:1198] (3/4) Epoch 24, batch 250, loss[loss=0.2343, ctc_loss=0.1578, cr_loss=0.3825, over 16627.00 frames. ], tot_loss[loss=0.2075, ctc_loss=0.1368, cr_loss=0.3532, over 2401524.42 frames. ], batch size: 66, lr: 5.03e-03, grad_scale: 32.0 2024-09-24 04:22:41,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=419388.6666666667, ans=0.1 2024-09-24 04:23:05,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=419435.3333333333, ans=0.125 2024-09-24 04:23:16,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=419482.0, ans=0.125 2024-09-24 04:23:17,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=419482.0, ans=0.125 2024-09-24 04:23:40,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=419528.6666666667, ans=0.09899494936611666 2024-09-24 04:23:46,171 INFO [train.py:1198] (3/4) Epoch 24, batch 300, loss[loss=0.2325, ctc_loss=0.155, cr_loss=0.3877, over 16989.00 frames. ], tot_loss[loss=0.2096, ctc_loss=0.1385, cr_loss=0.3557, over 2599902.69 frames. ], batch size: 53, lr: 5.03e-03, grad_scale: 32.0 2024-09-24 04:23:49,253 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.252e+02 1.369e+02 1.490e+02 2.368e+02, threshold=2.737e+02, percent-clipped=0.0 2024-09-24 04:24:00,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=419622.0, ans=0.125 2024-09-24 04:24:11,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=419622.0, ans=0.04949747468305833 2024-09-24 04:24:21,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=419668.6666666667, ans=0.0 2024-09-24 04:24:43,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=419715.3333333333, ans=0.1 2024-09-24 04:24:57,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=419762.0, ans=0.05 2024-09-24 04:25:08,437 INFO [train.py:1198] (3/4) Epoch 24, batch 350, loss[loss=0.2088, ctc_loss=0.1407, cr_loss=0.3406, over 17010.00 frames. ], tot_loss[loss=0.2099, ctc_loss=0.1388, cr_loss=0.355, over 2761144.06 frames. ], batch size: 51, lr: 5.03e-03, grad_scale: 32.0 2024-09-24 04:25:32,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2024-09-24 04:25:39,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=419855.3333333333, ans=0.05 2024-09-24 04:25:51,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2024-09-24 04:25:57,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=419902.0, ans=0.125 2024-09-24 04:26:00,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=419948.6666666667, ans=0.125 2024-09-24 04:26:34,309 INFO [train.py:1198] (3/4) Epoch 24, batch 400, loss[loss=0.2062, ctc_loss=0.1325, cr_loss=0.3685, over 17069.00 frames. ], tot_loss[loss=0.2101, ctc_loss=0.1389, cr_loss=0.3558, over 2890658.08 frames. ], batch size: 39, lr: 5.03e-03, grad_scale: 32.0 2024-09-24 04:26:37,590 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.274e+02 1.350e+02 1.482e+02 1.874e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-24 04:27:09,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=420135.3333333333, ans=0.125 2024-09-24 04:27:13,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=420135.3333333333, ans=0.0 2024-09-24 04:27:27,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=420182.0, ans=0.1 2024-09-24 04:27:41,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=420228.6666666667, ans=0.125 2024-09-24 04:27:53,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.11 vs. limit=22.5 2024-09-24 04:27:57,668 INFO [train.py:1198] (3/4) Epoch 24, batch 450, loss[loss=0.2166, ctc_loss=0.1451, cr_loss=0.3574, over 17023.00 frames. ], tot_loss[loss=0.2102, ctc_loss=0.1391, cr_loss=0.3556, over 2984701.88 frames. ], batch size: 52, lr: 5.03e-03, grad_scale: 32.0 2024-09-24 04:28:09,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=420275.3333333333, ans=0.125 2024-09-24 04:28:26,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=420322.0, ans=0.0 2024-09-24 04:28:46,507 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:28:48,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=22.5 2024-09-24 04:28:50,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=22.5 2024-09-24 04:28:59,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=420415.3333333333, ans=0.125 2024-09-24 04:29:10,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=420462.0, ans=0.07 2024-09-24 04:29:17,791 INFO [train.py:1198] (3/4) Epoch 24, batch 500, loss[loss=0.209, ctc_loss=0.1385, cr_loss=0.3527, over 16755.00 frames. ], tot_loss[loss=0.2109, ctc_loss=0.1396, cr_loss=0.3569, over 3059470.63 frames. ], batch size: 61, lr: 5.03e-03, grad_scale: 32.0 2024-09-24 04:29:21,053 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.026e+02 1.236e+02 1.308e+02 1.375e+02 2.594e+02, threshold=2.616e+02, percent-clipped=0.0 2024-09-24 04:29:40,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=420555.3333333333, ans=0.5 2024-09-24 04:29:51,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=420602.0, ans=0.09899494936611666 2024-09-24 04:30:33,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=420695.3333333333, ans=0.125 2024-09-24 04:30:34,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=420695.3333333333, ans=0.125 2024-09-24 04:30:45,506 INFO [train.py:1198] (3/4) Epoch 24, batch 550, loss[loss=0.2601, ctc_loss=0.1768, cr_loss=0.4167, over 15184.00 frames. ], tot_loss[loss=0.2101, ctc_loss=0.1389, cr_loss=0.3564, over 3128678.28 frames. ], batch size: 89, lr: 5.03e-03, grad_scale: 16.0 2024-09-24 04:30:46,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2024-09-24 04:31:40,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=420882.0, ans=0.125 2024-09-24 04:31:57,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=420928.6666666667, ans=0.125 2024-09-24 04:32:08,522 INFO [train.py:1198] (3/4) Epoch 24, batch 600, loss[loss=0.1686, ctc_loss=0.109, cr_loss=0.298, over 16753.00 frames. ], tot_loss[loss=0.2092, ctc_loss=0.1382, cr_loss=0.355, over 3179035.83 frames. ], batch size: 37, lr: 5.02e-03, grad_scale: 16.0 2024-09-24 04:32:13,194 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.249e+02 1.314e+02 1.426e+02 3.030e+02, threshold=2.628e+02, percent-clipped=1.0 2024-09-24 04:32:13,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=420975.3333333333, ans=0.125 2024-09-24 04:32:24,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=421022.0, ans=0.125 2024-09-24 04:32:26,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.91 vs. limit=10.0 2024-09-24 04:32:27,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=421022.0, ans=0.125 2024-09-24 04:32:31,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.42 vs. limit=15.0 2024-09-24 04:32:43,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=421068.6666666667, ans=0.125 2024-09-24 04:32:59,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=421115.3333333333, ans=0.0 2024-09-24 04:33:04,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=421115.3333333333, ans=0.0 2024-09-24 04:33:14,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=421162.0, ans=0.125 2024-09-24 04:33:23,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=421162.0, ans=0.015 2024-09-24 04:33:28,575 INFO [train.py:1198] (3/4) Epoch 24, batch 650, loss[loss=0.1753, ctc_loss=0.1133, cr_loss=0.3097, over 16773.00 frames. ], tot_loss[loss=0.2081, ctc_loss=0.1374, cr_loss=0.3532, over 3216909.91 frames. ], batch size: 37, lr: 5.02e-03, grad_scale: 16.0 2024-09-24 04:33:31,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=421208.6666666667, ans=0.125 2024-09-24 04:33:34,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.66 vs. limit=15.0 2024-09-24 04:33:43,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.89 vs. limit=15.0 2024-09-24 04:33:48,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=421255.3333333333, ans=0.125 2024-09-24 04:33:54,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=421255.3333333333, ans=0.125 2024-09-24 04:34:10,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=421302.0, ans=0.0 2024-09-24 04:34:20,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=421348.6666666667, ans=0.125 2024-09-24 04:34:48,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=421395.3333333333, ans=15.0 2024-09-24 04:34:51,515 INFO [train.py:1198] (3/4) Epoch 24, batch 700, loss[loss=0.2255, ctc_loss=0.1508, cr_loss=0.3736, over 17235.00 frames. ], tot_loss[loss=0.2092, ctc_loss=0.1383, cr_loss=0.3547, over 3245244.27 frames. ], batch size: 47, lr: 5.02e-03, grad_scale: 16.0 2024-09-24 04:34:56,334 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.315e+02 1.415e+02 1.543e+02 2.275e+02, threshold=2.830e+02, percent-clipped=0.0 2024-09-24 04:35:03,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=421442.0, ans=0.2 2024-09-24 04:35:27,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=12.0 2024-09-24 04:35:35,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2024-09-24 04:35:39,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=421535.3333333333, ans=0.125 2024-09-24 04:35:49,383 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.48 vs. limit=15.0 2024-09-24 04:35:57,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-09-24 04:36:03,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=421628.6666666667, ans=0.025 2024-09-24 04:36:14,460 INFO [train.py:1198] (3/4) Epoch 24, batch 750, loss[loss=0.1962, ctc_loss=0.1289, cr_loss=0.3362, over 17077.00 frames. ], tot_loss[loss=0.2106, ctc_loss=0.1393, cr_loss=0.3566, over 3265999.85 frames. ], batch size: 49, lr: 5.02e-03, grad_scale: 16.0 2024-09-24 04:36:27,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=421675.3333333333, ans=0.0 2024-09-24 04:37:04,367 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2024-09-24 04:37:34,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=421862.0, ans=0.0 2024-09-24 04:37:35,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=421908.6666666667, ans=0.0 2024-09-24 04:37:37,164 INFO [train.py:1198] (3/4) Epoch 24, batch 800, loss[loss=0.2118, ctc_loss=0.141, cr_loss=0.354, over 17032.00 frames. ], tot_loss[loss=0.2105, ctc_loss=0.1392, cr_loss=0.3563, over 3288633.49 frames. ], batch size: 51, lr: 5.02e-03, grad_scale: 32.0 2024-09-24 04:37:41,990 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.296e+02 1.355e+02 1.500e+02 2.153e+02, threshold=2.711e+02, percent-clipped=0.0 2024-09-24 04:37:47,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=421908.6666666667, ans=0.025 2024-09-24 04:38:24,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=422048.6666666667, ans=0.125 2024-09-24 04:38:55,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=422142.0, ans=0.125 2024-09-24 04:38:57,343 INFO [train.py:1198] (3/4) Epoch 24, batch 850, loss[loss=0.2058, ctc_loss=0.1376, cr_loss=0.3409, over 17291.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1384, cr_loss=0.3554, over 3309325.66 frames. ], batch size: 46, lr: 5.02e-03, grad_scale: 32.0 2024-09-24 04:39:30,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=422235.3333333333, ans=0.1 2024-09-24 04:39:51,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=422282.0, ans=0.07 2024-09-24 04:40:00,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=422282.0, ans=0.125 2024-09-24 04:40:16,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.64 vs. limit=15.0 2024-09-24 04:40:25,046 INFO [train.py:1198] (3/4) Epoch 24, batch 900, loss[loss=0.2225, ctc_loss=0.1439, cr_loss=0.393, over 17316.00 frames. ], tot_loss[loss=0.2088, ctc_loss=0.138, cr_loss=0.3541, over 3316887.69 frames. ], batch size: 49, lr: 5.02e-03, grad_scale: 32.0 2024-09-24 04:40:29,775 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.271e+02 1.382e+02 1.510e+02 2.333e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-24 04:40:42,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=422422.0, ans=0.125 2024-09-24 04:40:59,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0 2024-09-24 04:41:17,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=422515.3333333333, ans=0.0 2024-09-24 04:41:20,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.31 vs. limit=10.0 2024-09-24 04:41:41,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-09-24 04:41:42,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=422562.0, ans=0.05 2024-09-24 04:41:45,284 INFO [train.py:1198] (3/4) Epoch 24, batch 950, loss[loss=0.1809, ctc_loss=0.1166, cr_loss=0.3213, over 17296.00 frames. ], tot_loss[loss=0.2078, ctc_loss=0.1372, cr_loss=0.353, over 3323635.37 frames. ], batch size: 46, lr: 5.01e-03, grad_scale: 32.0 2024-09-24 04:41:48,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=422608.6666666667, ans=0.0 2024-09-24 04:43:07,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=422842.0, ans=0.025 2024-09-24 04:43:07,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=422842.0, ans=0.0 2024-09-24 04:43:08,571 INFO [train.py:1198] (3/4) Epoch 24, batch 1000, loss[loss=0.1659, ctc_loss=0.1054, cr_loss=0.3022, over 17010.00 frames. ], tot_loss[loss=0.2083, ctc_loss=0.1376, cr_loss=0.3535, over 3322060.03 frames. ], batch size: 39, lr: 5.01e-03, grad_scale: 32.0 2024-09-24 04:43:13,213 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.311e+02 1.412e+02 1.541e+02 1.926e+02, threshold=2.824e+02, percent-clipped=0.0 2024-09-24 04:43:19,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=422842.0, ans=0.125 2024-09-24 04:44:30,828 INFO [train.py:1198] (3/4) Epoch 24, batch 1050, loss[loss=0.2157, ctc_loss=0.141, cr_loss=0.3733, over 17245.00 frames. ], tot_loss[loss=0.2085, ctc_loss=0.1377, cr_loss=0.3539, over 3334447.15 frames. ], batch size: 55, lr: 5.01e-03, grad_scale: 32.0 2024-09-24 04:44:53,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=423122.0, ans=0.0 2024-09-24 04:45:34,988 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2024-09-24 04:45:56,523 INFO [train.py:1198] (3/4) Epoch 24, batch 1100, loss[loss=0.2336, ctc_loss=0.1586, cr_loss=0.3751, over 16768.00 frames. ], tot_loss[loss=0.2091, ctc_loss=0.1382, cr_loss=0.3549, over 3339951.54 frames. ], batch size: 61, lr: 5.01e-03, grad_scale: 32.0 2024-09-24 04:46:01,273 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 9.919e+01 1.239e+02 1.343e+02 1.468e+02 1.769e+02, threshold=2.686e+02, percent-clipped=0.0 2024-09-24 04:46:06,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=423308.6666666667, ans=0.125 2024-09-24 04:46:13,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.18 vs. limit=10.0 2024-09-24 04:46:14,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=423355.3333333333, ans=0.125 2024-09-24 04:46:25,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=423355.3333333333, ans=0.025 2024-09-24 04:46:30,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=423402.0, ans=0.125 2024-09-24 04:46:39,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=423402.0, ans=0.1 2024-09-24 04:46:41,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=423402.0, ans=0.1 2024-09-24 04:47:05,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2024-09-24 04:47:06,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=423495.3333333333, ans=0.025 2024-09-24 04:47:18,700 INFO [train.py:1198] (3/4) Epoch 24, batch 1150, loss[loss=0.2371, ctc_loss=0.1574, cr_loss=0.3985, over 16923.00 frames. ], tot_loss[loss=0.2096, ctc_loss=0.1385, cr_loss=0.3558, over 3349364.46 frames. ], batch size: 58, lr: 5.01e-03, grad_scale: 32.0 2024-09-24 04:47:38,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=423588.6666666667, ans=0.2 2024-09-24 04:47:38,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=423588.6666666667, ans=0.1 2024-09-24 04:47:38,722 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2024-09-24 04:48:13,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=423682.0, ans=0.05 2024-09-24 04:48:20,382 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.05 vs. limit=15.0 2024-09-24 04:48:35,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=8.0 2024-09-24 04:48:39,273 INFO [train.py:1198] (3/4) Epoch 24, batch 1200, loss[loss=0.236, ctc_loss=0.1567, cr_loss=0.3967, over 16986.00 frames. ], tot_loss[loss=0.21, ctc_loss=0.1387, cr_loss=0.3563, over 3352179.83 frames. ], batch size: 53, lr: 5.01e-03, grad_scale: 32.0 2024-09-24 04:48:41,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=423775.3333333333, ans=0.0 2024-09-24 04:48:44,026 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.248e+02 1.325e+02 1.416e+02 2.562e+02, threshold=2.650e+02, percent-clipped=0.0 2024-09-24 04:48:55,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.64 vs. limit=5.0 2024-09-24 04:48:55,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=423822.0, ans=0.2 2024-09-24 04:49:49,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=423962.0, ans=0.2 2024-09-24 04:50:03,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-09-24 04:50:06,372 INFO [train.py:1198] (3/4) Epoch 24, batch 1250, loss[loss=0.2081, ctc_loss=0.1371, cr_loss=0.3551, over 17302.00 frames. ], tot_loss[loss=0.2104, ctc_loss=0.1391, cr_loss=0.3568, over 3352434.76 frames. ], batch size: 46, lr: 5.01e-03, grad_scale: 32.0 2024-09-24 04:50:13,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=424008.6666666667, ans=0.125 2024-09-24 04:50:48,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.68 vs. limit=10.0 2024-09-24 04:50:58,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=424148.6666666667, ans=0.125 2024-09-24 04:51:26,794 INFO [train.py:1198] (3/4) Epoch 24, batch 1300, loss[loss=0.1794, ctc_loss=0.1145, cr_loss=0.3244, over 17043.00 frames. ], tot_loss[loss=0.2094, ctc_loss=0.1383, cr_loss=0.3556, over 3349671.37 frames. ], batch size: 39, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 04:51:31,563 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.033e+02 1.254e+02 1.331e+02 1.449e+02 1.850e+02, threshold=2.662e+02, percent-clipped=0.0 2024-09-24 04:52:04,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=424335.3333333333, ans=0.125 2024-09-24 04:52:32,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=424428.6666666667, ans=0.125 2024-09-24 04:52:39,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-09-24 04:52:49,353 INFO [train.py:1198] (3/4) Epoch 24, batch 1350, loss[loss=0.1908, ctc_loss=0.1249, cr_loss=0.3297, over 17207.00 frames. ], tot_loss[loss=0.2093, ctc_loss=0.1382, cr_loss=0.3556, over 3357634.15 frames. ], batch size: 50, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 04:53:23,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=424568.6666666667, ans=0.125 2024-09-24 04:53:55,796 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.26 vs. limit=10.0 2024-09-24 04:54:00,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=424662.0, ans=0.125 2024-09-24 04:54:10,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=424708.6666666667, ans=0.1 2024-09-24 04:54:11,971 INFO [train.py:1198] (3/4) Epoch 24, batch 1400, loss[loss=0.1799, ctc_loss=0.1146, cr_loss=0.3267, over 17290.00 frames. ], tot_loss[loss=0.2091, ctc_loss=0.138, cr_loss=0.3553, over 3358774.73 frames. ], batch size: 42, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 04:54:15,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=424708.6666666667, ans=15.0 2024-09-24 04:54:16,815 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.257e+02 1.358e+02 1.495e+02 1.831e+02, threshold=2.716e+02, percent-clipped=0.0 2024-09-24 04:54:20,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=424708.6666666667, ans=0.125 2024-09-24 04:54:22,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2024-09-24 04:54:33,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=424755.3333333333, ans=0.0 2024-09-24 04:55:35,182 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2024-09-24 04:55:37,532 INFO [train.py:1198] (3/4) Epoch 24, batch 1450, loss[loss=0.2005, ctc_loss=0.1302, cr_loss=0.3516, over 17250.00 frames. ], tot_loss[loss=0.2097, ctc_loss=0.1385, cr_loss=0.3561, over 3364812.01 frames. ], batch size: 44, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 04:55:38,265 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=22.5 2024-09-24 04:55:42,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=424942.0, ans=0.2 2024-09-24 04:55:47,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-09-24 04:55:48,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2024-09-24 04:55:49,009 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:55:49,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=424942.0, ans=0.125 2024-09-24 04:56:00,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=424988.6666666667, ans=0.09899494936611666 2024-09-24 04:56:06,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=424988.6666666667, ans=0.0 2024-09-24 04:56:19,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=425035.3333333333, ans=0.2 2024-09-24 04:56:20,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=425035.3333333333, ans=0.125 2024-09-24 04:56:37,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=425082.0, ans=0.0 2024-09-24 04:56:44,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=425128.6666666667, ans=0.125 2024-09-24 04:56:59,699 INFO [train.py:1198] (3/4) Epoch 24, batch 1500, loss[loss=0.1849, ctc_loss=0.119, cr_loss=0.3292, over 16688.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1383, cr_loss=0.3558, over 3363714.52 frames. ], batch size: 37, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 04:57:04,518 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.246e+02 1.334e+02 1.449e+02 2.075e+02, threshold=2.667e+02, percent-clipped=0.0 2024-09-24 04:57:11,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=425175.3333333333, ans=0.125 2024-09-24 04:57:12,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=425175.3333333333, ans=0.025 2024-09-24 04:57:35,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=425268.6666666667, ans=0.05 2024-09-24 04:57:51,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2024-09-24 04:58:04,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=425362.0, ans=0.125 2024-09-24 04:58:19,810 INFO [train.py:1198] (3/4) Epoch 24, batch 1550, loss[loss=0.2344, ctc_loss=0.1566, cr_loss=0.3891, over 17317.00 frames. ], tot_loss[loss=0.208, ctc_loss=0.1372, cr_loss=0.3537, over 3366099.11 frames. ], batch size: 49, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 04:58:39,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=425455.3333333333, ans=0.05 2024-09-24 04:58:42,659 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:58:49,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=425455.3333333333, ans=0.125 2024-09-24 04:58:50,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=425502.0, ans=0.125 2024-09-24 04:58:51,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.18 vs. limit=22.5 2024-09-24 04:59:10,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=425548.6666666667, ans=0.0 2024-09-24 04:59:20,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=425548.6666666667, ans=0.0 2024-09-24 04:59:42,278 INFO [train.py:1198] (3/4) Epoch 24, batch 1600, loss[loss=0.2006, ctc_loss=0.1323, cr_loss=0.3417, over 17060.00 frames. ], tot_loss[loss=0.2085, ctc_loss=0.1377, cr_loss=0.3542, over 3363283.99 frames. ], batch size: 46, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 04:59:47,003 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.230e+02 1.386e+02 1.499e+02 2.034e+02, threshold=2.773e+02, percent-clipped=0.0 2024-09-24 05:00:06,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=425688.6666666667, ans=0.05 2024-09-24 05:00:30,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=425735.3333333333, ans=0.0 2024-09-24 05:00:43,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=425782.0, ans=0.0 2024-09-24 05:00:48,654 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2024-09-24 05:00:52,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=425828.6666666667, ans=0.0 2024-09-24 05:00:53,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.77 vs. limit=10.0 2024-09-24 05:01:02,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=425828.6666666667, ans=0.2 2024-09-24 05:01:07,192 INFO [train.py:1198] (3/4) Epoch 24, batch 1650, loss[loss=0.1692, ctc_loss=0.1113, cr_loss=0.2894, over 17026.00 frames. ], tot_loss[loss=0.2083, ctc_loss=0.1374, cr_loss=0.3544, over 3369604.52 frames. ], batch size: 39, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 05:01:07,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=425875.3333333333, ans=0.125 2024-09-24 05:01:56,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=426015.3333333333, ans=0.2 2024-09-24 05:02:04,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=426015.3333333333, ans=0.09899494936611666 2024-09-24 05:02:10,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=426015.3333333333, ans=0.2 2024-09-24 05:02:15,937 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=15.0 2024-09-24 05:02:29,581 INFO [train.py:1198] (3/4) Epoch 24, batch 1700, loss[loss=0.1563, ctc_loss=0.1002, cr_loss=0.2803, over 17094.00 frames. ], tot_loss[loss=0.2075, ctc_loss=0.1368, cr_loss=0.3533, over 3370149.88 frames. ], batch size: 43, lr: 4.99e-03, grad_scale: 32.0 2024-09-24 05:02:34,432 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.248e+02 1.319e+02 1.421e+02 3.276e+02, threshold=2.637e+02, percent-clipped=2.0 2024-09-24 05:02:43,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-09-24 05:02:58,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=426155.3333333333, ans=0.0 2024-09-24 05:03:45,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.45 vs. limit=15.0 2024-09-24 05:03:49,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=426342.0, ans=0.95 2024-09-24 05:03:49,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2024-09-24 05:03:50,636 INFO [train.py:1198] (3/4) Epoch 24, batch 1750, loss[loss=0.1987, ctc_loss=0.1295, cr_loss=0.346, over 17098.00 frames. ], tot_loss[loss=0.2083, ctc_loss=0.1375, cr_loss=0.3542, over 3363587.33 frames. ], batch size: 43, lr: 4.99e-03, grad_scale: 32.0 2024-09-24 05:03:58,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=426342.0, ans=0.125 2024-09-24 05:04:04,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=12.0 2024-09-24 05:04:25,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=426435.3333333333, ans=0.0 2024-09-24 05:04:31,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=426435.3333333333, ans=0.125 2024-09-24 05:05:00,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=426528.6666666667, ans=0.0 2024-09-24 05:05:02,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=426528.6666666667, ans=0.2 2024-09-24 05:05:10,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-09-24 05:05:11,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=426528.6666666667, ans=0.025 2024-09-24 05:05:17,717 INFO [train.py:1198] (3/4) Epoch 24, batch 1800, loss[loss=0.1923, ctc_loss=0.1259, cr_loss=0.3322, over 17243.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.1366, cr_loss=0.3524, over 3368512.36 frames. ], batch size: 42, lr: 4.99e-03, grad_scale: 32.0 2024-09-24 05:05:22,469 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.235e+02 1.322e+02 1.422e+02 1.827e+02, threshold=2.643e+02, percent-clipped=0.0 2024-09-24 05:05:29,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=426575.3333333333, ans=0.0 2024-09-24 05:05:59,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=426668.6666666667, ans=0.1 2024-09-24 05:06:02,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=426668.6666666667, ans=0.0 2024-09-24 05:06:35,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2024-09-24 05:06:37,703 INFO [train.py:1198] (3/4) Epoch 24, batch 1850, loss[loss=0.243, ctc_loss=0.1644, cr_loss=0.3933, over 15084.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.1362, cr_loss=0.3512, over 3361752.54 frames. ], batch size: 89, lr: 4.99e-03, grad_scale: 32.0 2024-09-24 05:06:50,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=426808.6666666667, ans=0.0 2024-09-24 05:06:55,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=426855.3333333333, ans=0.125 2024-09-24 05:07:12,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=426902.0, ans=0.125 2024-09-24 05:07:22,904 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-09-24 05:07:49,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=426995.3333333333, ans=0.125 2024-09-24 05:08:00,018 INFO [train.py:1198] (3/4) Epoch 24, batch 1900, loss[loss=0.1858, ctc_loss=0.1187, cr_loss=0.3358, over 17101.00 frames. ], tot_loss[loss=0.2065, ctc_loss=0.1362, cr_loss=0.3517, over 3364992.86 frames. ], batch size: 40, lr: 4.99e-03, grad_scale: 16.0 2024-09-24 05:08:06,244 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.237e+02 1.306e+02 1.387e+02 1.778e+02, threshold=2.611e+02, percent-clipped=0.0 2024-09-24 05:08:42,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-09-24 05:08:45,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=427135.3333333333, ans=0.1 2024-09-24 05:08:50,414 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.42 vs. limit=6.0 2024-09-24 05:09:19,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=427228.6666666667, ans=0.125 2024-09-24 05:09:22,707 INFO [train.py:1198] (3/4) Epoch 24, batch 1950, loss[loss=0.2594, ctc_loss=0.1809, cr_loss=0.3925, over 11690.00 frames. ], tot_loss[loss=0.2077, ctc_loss=0.1371, cr_loss=0.3528, over 3360375.34 frames. ], batch size: 124, lr: 4.99e-03, grad_scale: 16.0 2024-09-24 05:10:03,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=22.5 2024-09-24 05:10:20,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=427415.3333333333, ans=0.125 2024-09-24 05:10:24,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=22.5 2024-09-24 05:10:33,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=427462.0, ans=0.0 2024-09-24 05:10:40,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=427462.0, ans=0.1 2024-09-24 05:10:46,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=22.5 2024-09-24 05:10:47,725 INFO [train.py:1198] (3/4) Epoch 24, batch 2000, loss[loss=0.2157, ctc_loss=0.1436, cr_loss=0.3602, over 17102.00 frames. ], tot_loss[loss=0.2098, ctc_loss=0.1388, cr_loss=0.3554, over 3352940.10 frames. ], batch size: 49, lr: 4.99e-03, grad_scale: 16.0 2024-09-24 05:10:55,710 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.252e+02 1.338e+02 1.434e+02 1.849e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-24 05:10:57,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=427508.6666666667, ans=0.0 2024-09-24 05:11:50,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=427695.3333333333, ans=0.1 2024-09-24 05:12:09,511 INFO [train.py:1198] (3/4) Epoch 24, batch 2050, loss[loss=0.2334, ctc_loss=0.1569, cr_loss=0.3823, over 15080.00 frames. ], tot_loss[loss=0.2106, ctc_loss=0.1394, cr_loss=0.3562, over 3336829.42 frames. ], batch size: 89, lr: 4.98e-03, grad_scale: 16.0 2024-09-24 05:12:20,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=427742.0, ans=0.125 2024-09-24 05:12:21,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2024-09-24 05:12:35,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=427788.6666666667, ans=0.0 2024-09-24 05:12:47,319 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=22.5 2024-09-24 05:12:57,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=427882.0, ans=0.025 2024-09-24 05:12:57,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=427882.0, ans=0.1 2024-09-24 05:13:08,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=427882.0, ans=0.2 2024-09-24 05:13:29,443 INFO [train.py:1198] (3/4) Epoch 24, batch 2100, loss[loss=0.2124, ctc_loss=0.1383, cr_loss=0.3703, over 17315.00 frames. ], tot_loss[loss=0.2084, ctc_loss=0.1377, cr_loss=0.3535, over 3346841.66 frames. ], batch size: 51, lr: 4.98e-03, grad_scale: 16.0 2024-09-24 05:13:31,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=427975.3333333333, ans=0.0 2024-09-24 05:13:37,476 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.278e+02 1.335e+02 1.481e+02 2.167e+02, threshold=2.670e+02, percent-clipped=0.0 2024-09-24 05:13:45,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=428022.0, ans=0.125 2024-09-24 05:14:22,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=428115.3333333333, ans=0.2 2024-09-24 05:14:28,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=428115.3333333333, ans=0.125 2024-09-24 05:14:31,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=428115.3333333333, ans=0.0 2024-09-24 05:14:34,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=428162.0, ans=0.125 2024-09-24 05:14:54,811 INFO [train.py:1198] (3/4) Epoch 24, batch 2150, loss[loss=0.2266, ctc_loss=0.1491, cr_loss=0.3873, over 16986.00 frames. ], tot_loss[loss=0.2075, ctc_loss=0.1369, cr_loss=0.3531, over 3361234.86 frames. ], batch size: 53, lr: 4.98e-03, grad_scale: 16.0 2024-09-24 05:15:01,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=428208.6666666667, ans=0.125 2024-09-24 05:15:08,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=428208.6666666667, ans=0.04949747468305833 2024-09-24 05:15:12,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=428255.3333333333, ans=0.125 2024-09-24 05:15:47,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=428348.6666666667, ans=0.125 2024-09-24 05:15:49,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=428348.6666666667, ans=0.025 2024-09-24 05:15:55,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=428348.6666666667, ans=0.125 2024-09-24 05:15:55,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=428348.6666666667, ans=0.1 2024-09-24 05:15:59,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.78 vs. limit=10.0 2024-09-24 05:16:08,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=428395.3333333333, ans=0.0 2024-09-24 05:16:17,771 INFO [train.py:1198] (3/4) Epoch 24, batch 2200, loss[loss=0.2151, ctc_loss=0.1432, cr_loss=0.3595, over 16808.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.1361, cr_loss=0.3518, over 3366414.77 frames. ], batch size: 61, lr: 4.98e-03, grad_scale: 16.0 2024-09-24 05:16:25,769 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.245e+02 1.329e+02 1.411e+02 2.102e+02, threshold=2.657e+02, percent-clipped=0.0 2024-09-24 05:16:27,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=428442.0, ans=0.125 2024-09-24 05:16:42,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=428488.6666666667, ans=0.125 2024-09-24 05:17:02,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=428535.3333333333, ans=0.0 2024-09-24 05:17:14,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=428582.0, ans=0.025 2024-09-24 05:17:27,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=428628.6666666667, ans=15.0 2024-09-24 05:17:33,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=428628.6666666667, ans=0.025 2024-09-24 05:17:40,962 INFO [train.py:1198] (3/4) Epoch 24, batch 2250, loss[loss=0.2024, ctc_loss=0.1341, cr_loss=0.3416, over 15916.00 frames. ], tot_loss[loss=0.2067, ctc_loss=0.1363, cr_loss=0.3523, over 3360977.39 frames. ], batch size: 74, lr: 4.98e-03, grad_scale: 16.0 2024-09-24 05:18:27,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=428815.3333333333, ans=0.1 2024-09-24 05:19:01,242 INFO [train.py:1198] (3/4) Epoch 24, batch 2300, loss[loss=0.2104, ctc_loss=0.137, cr_loss=0.3667, over 17241.00 frames. ], tot_loss[loss=0.2079, ctc_loss=0.1372, cr_loss=0.3533, over 3346963.44 frames. ], batch size: 47, lr: 4.98e-03, grad_scale: 16.0 2024-09-24 05:19:11,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2024-09-24 05:19:11,767 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.244e+02 1.345e+02 1.429e+02 2.009e+02, threshold=2.690e+02, percent-clipped=0.0 2024-09-24 05:19:41,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=22.5 2024-09-24 05:19:43,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=15.0 2024-09-24 05:19:49,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=429002.0, ans=0.125 2024-09-24 05:20:09,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2024-09-24 05:20:10,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.67 vs. limit=15.0 2024-09-24 05:20:19,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=429095.3333333333, ans=0.0 2024-09-24 05:20:24,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=429095.3333333333, ans=0.05 2024-09-24 05:20:28,879 INFO [train.py:1198] (3/4) Epoch 24, batch 2350, loss[loss=0.2204, ctc_loss=0.1408, cr_loss=0.3975, over 17214.00 frames. ], tot_loss[loss=0.2073, ctc_loss=0.1367, cr_loss=0.353, over 3354421.52 frames. ], batch size: 47, lr: 4.98e-03, grad_scale: 16.0 2024-09-24 05:21:00,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=429235.3333333333, ans=0.1 2024-09-24 05:21:08,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=429235.3333333333, ans=0.0 2024-09-24 05:21:17,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=429282.0, ans=0.025 2024-09-24 05:21:54,455 INFO [train.py:1198] (3/4) Epoch 24, batch 2400, loss[loss=0.2091, ctc_loss=0.1438, cr_loss=0.3264, over 17013.00 frames. ], tot_loss[loss=0.2082, ctc_loss=0.1374, cr_loss=0.3541, over 3351134.54 frames. ], batch size: 53, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:22:02,377 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.270e+02 1.408e+02 1.564e+02 2.432e+02, threshold=2.816e+02, percent-clipped=0.0 2024-09-24 05:22:04,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=22.5 2024-09-24 05:22:41,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.22 vs. limit=15.0 2024-09-24 05:22:50,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=429515.3333333333, ans=0.125 2024-09-24 05:22:53,040 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2024-09-24 05:23:05,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=429562.0, ans=0.125 2024-09-24 05:23:14,641 INFO [train.py:1198] (3/4) Epoch 24, batch 2450, loss[loss=0.2338, ctc_loss=0.154, cr_loss=0.3988, over 17033.00 frames. ], tot_loss[loss=0.2072, ctc_loss=0.1365, cr_loss=0.3534, over 3355187.08 frames. ], batch size: 56, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:23:18,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=429608.6666666667, ans=0.125 2024-09-24 05:23:18,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.80 vs. limit=15.0 2024-09-24 05:24:23,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=429795.3333333333, ans=0.0 2024-09-24 05:24:37,354 INFO [train.py:1198] (3/4) Epoch 24, batch 2500, loss[loss=0.2602, ctc_loss=0.1761, cr_loss=0.4205, over 14887.00 frames. ], tot_loss[loss=0.2081, ctc_loss=0.1373, cr_loss=0.3542, over 3351703.30 frames. ], batch size: 89, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:24:48,051 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.250e+02 1.362e+02 1.457e+02 2.366e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-24 05:25:07,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=429888.6666666667, ans=0.0 2024-09-24 05:25:07,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=429888.6666666667, ans=0.0 2024-09-24 05:25:15,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=429935.3333333333, ans=0.1 2024-09-24 05:25:29,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=429982.0, ans=0.125 2024-09-24 05:25:48,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=430028.6666666667, ans=0.1 2024-09-24 05:26:02,840 INFO [train.py:1198] (3/4) Epoch 24, batch 2550, loss[loss=0.2547, ctc_loss=0.1767, cr_loss=0.3904, over 11720.00 frames. ], tot_loss[loss=0.2084, ctc_loss=0.1375, cr_loss=0.3544, over 3332997.98 frames. ], batch size: 123, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:26:26,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=430122.0, ans=0.125 2024-09-24 05:27:25,869 INFO [train.py:1198] (3/4) Epoch 24, batch 2600, loss[loss=0.1866, ctc_loss=0.1244, cr_loss=0.3109, over 17257.00 frames. ], tot_loss[loss=0.2078, ctc_loss=0.1371, cr_loss=0.3535, over 3329917.26 frames. ], batch size: 44, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:27:27,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=430308.6666666667, ans=0.1 2024-09-24 05:27:33,811 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.253e+02 1.345e+02 1.495e+02 2.149e+02, threshold=2.690e+02, percent-clipped=0.0 2024-09-24 05:27:34,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=430308.6666666667, ans=0.2 2024-09-24 05:27:35,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=430308.6666666667, ans=0.125 2024-09-24 05:27:40,604 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 05:27:43,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=430355.3333333333, ans=0.2 2024-09-24 05:28:44,975 INFO [train.py:1198] (3/4) Epoch 24, batch 2650, loss[loss=0.2006, ctc_loss=0.1308, cr_loss=0.3493, over 17368.00 frames. ], tot_loss[loss=0.2077, ctc_loss=0.137, cr_loss=0.3536, over 3337899.52 frames. ], batch size: 48, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:28:49,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-09-24 05:29:03,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=430588.6666666667, ans=0.1 2024-09-24 05:29:07,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2024-09-24 05:29:21,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=430635.3333333333, ans=0.1 2024-09-24 05:29:41,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=430682.0, ans=0.125 2024-09-24 05:29:50,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=430682.0, ans=0.2 2024-09-24 05:29:56,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=430728.6666666667, ans=0.125 2024-09-24 05:30:12,485 INFO [train.py:1198] (3/4) Epoch 24, batch 2700, loss[loss=0.2258, ctc_loss=0.1498, cr_loss=0.38, over 17298.00 frames. ], tot_loss[loss=0.2086, ctc_loss=0.1376, cr_loss=0.3549, over 3348411.78 frames. ], batch size: 49, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:30:14,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=430775.3333333333, ans=0.1 2024-09-24 05:30:20,466 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.268e+02 1.356e+02 1.521e+02 3.128e+02, threshold=2.712e+02, percent-clipped=1.0 2024-09-24 05:30:35,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=430822.0, ans=0.125 2024-09-24 05:30:49,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.82 vs. limit=22.5 2024-09-24 05:31:32,249 INFO [train.py:1198] (3/4) Epoch 24, batch 2750, loss[loss=0.1778, ctc_loss=0.1124, cr_loss=0.3269, over 17043.00 frames. ], tot_loss[loss=0.2085, ctc_loss=0.1374, cr_loss=0.3553, over 3347712.77 frames. ], batch size: 39, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:31:32,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=431008.6666666667, ans=0.0 2024-09-24 05:32:08,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=431102.0, ans=0.2 2024-09-24 05:32:28,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=431148.6666666667, ans=0.0 2024-09-24 05:32:30,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=431148.6666666667, ans=0.0 2024-09-24 05:32:49,845 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=15.0 2024-09-24 05:32:55,660 INFO [train.py:1198] (3/4) Epoch 24, batch 2800, loss[loss=0.1959, ctc_loss=0.1303, cr_loss=0.3279, over 17254.00 frames. ], tot_loss[loss=0.2083, ctc_loss=0.1374, cr_loss=0.3545, over 3349611.07 frames. ], batch size: 44, lr: 4.96e-03, grad_scale: 32.0 2024-09-24 05:33:03,460 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.249e+02 1.392e+02 1.537e+02 2.011e+02, threshold=2.784e+02, percent-clipped=0.0 2024-09-24 05:33:08,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=22.5 2024-09-24 05:33:09,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=431288.6666666667, ans=0.125 2024-09-24 05:33:41,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=15.0 2024-09-24 05:33:41,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2024-09-24 05:33:43,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=431382.0, ans=0.125 2024-09-24 05:34:04,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.92 vs. limit=10.0 2024-09-24 05:34:18,064 INFO [train.py:1198] (3/4) Epoch 24, batch 2850, loss[loss=0.2505, ctc_loss=0.1785, cr_loss=0.3599, over 11701.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.1366, cr_loss=0.3525, over 3344006.79 frames. ], batch size: 123, lr: 4.96e-03, grad_scale: 32.0 2024-09-24 05:34:20,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2024-09-24 05:34:26,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=431475.3333333333, ans=0.125 2024-09-24 05:34:58,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=431568.6666666667, ans=10.0 2024-09-24 05:35:14,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=431615.3333333333, ans=0.025 2024-09-24 05:35:15,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.32 vs. limit=15.0 2024-09-24 05:35:43,358 INFO [train.py:1198] (3/4) Epoch 24, batch 2900, loss[loss=0.226, ctc_loss=0.1522, cr_loss=0.3692, over 16738.00 frames. ], tot_loss[loss=0.2083, ctc_loss=0.1374, cr_loss=0.3542, over 3350522.32 frames. ], batch size: 61, lr: 4.96e-03, grad_scale: 16.0 2024-09-24 05:35:47,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=431708.6666666667, ans=0.0 2024-09-24 05:35:52,967 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.236e+02 1.333e+02 1.474e+02 3.420e+02, threshold=2.666e+02, percent-clipped=1.0 2024-09-24 05:35:59,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=431755.3333333333, ans=0.07 2024-09-24 05:36:30,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2024-09-24 05:36:41,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=431848.6666666667, ans=0.0 2024-09-24 05:36:55,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=431895.3333333333, ans=0.0 2024-09-24 05:37:02,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=431895.3333333333, ans=0.125 2024-09-24 05:37:06,445 INFO [train.py:1198] (3/4) Epoch 24, batch 2950, loss[loss=0.1839, ctc_loss=0.1197, cr_loss=0.321, over 16943.00 frames. ], tot_loss[loss=0.2078, ctc_loss=0.1371, cr_loss=0.3539, over 3354211.32 frames. ], batch size: 42, lr: 4.96e-03, grad_scale: 16.0 2024-09-24 05:38:04,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=432082.0, ans=0.125 2024-09-24 05:38:18,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=432128.6666666667, ans=0.09899494936611666 2024-09-24 05:38:21,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=432128.6666666667, ans=0.0 2024-09-24 05:38:26,285 INFO [train.py:1198] (3/4) Epoch 24, batch 3000, loss[loss=0.2085, ctc_loss=0.1345, cr_loss=0.3699, over 17153.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.1365, cr_loss=0.3535, over 3359822.88 frames. ], batch size: 45, lr: 4.96e-03, grad_scale: 16.0 2024-09-24 05:38:26,285 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 05:38:42,488 INFO [train.py:1230] (3/4) Epoch 24, validation: loss=0.03786, ctc_loss=0.03786, cr_loss=8.617e-15, over 944034.00 frames. 2024-09-24 05:38:42,488 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 05:38:46,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=12.0 2024-09-24 05:38:51,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=22.5 2024-09-24 05:38:51,890 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.040e+02 1.243e+02 1.342e+02 1.455e+02 1.995e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-24 05:39:27,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=432268.6666666667, ans=10.0 2024-09-24 05:39:28,096 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.00 vs. limit=22.5 2024-09-24 05:39:36,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=22.5 2024-09-24 05:39:39,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2024-09-24 05:39:51,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=432362.0, ans=0.2 2024-09-24 05:39:53,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=432362.0, ans=0.125 2024-09-24 05:40:03,772 INFO [train.py:1198] (3/4) Epoch 24, batch 3050, loss[loss=0.2114, ctc_loss=0.1439, cr_loss=0.3371, over 15988.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.1364, cr_loss=0.3535, over 3362624.31 frames. ], batch size: 74, lr: 4.96e-03, grad_scale: 16.0 2024-09-24 05:40:14,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2024-09-24 05:40:43,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=432502.0, ans=0.125 2024-09-24 05:41:02,109 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.13 vs. limit=15.0 2024-09-24 05:41:26,721 INFO [train.py:1198] (3/4) Epoch 24, batch 3100, loss[loss=0.1998, ctc_loss=0.1304, cr_loss=0.3473, over 17279.00 frames. ], tot_loss[loss=0.208, ctc_loss=0.1371, cr_loss=0.3548, over 3356043.70 frames. ], batch size: 51, lr: 4.96e-03, grad_scale: 16.0 2024-09-24 05:41:35,818 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.243e+02 1.328e+02 1.464e+02 2.073e+02, threshold=2.656e+02, percent-clipped=0.0 2024-09-24 05:41:48,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=432688.6666666667, ans=0.5 2024-09-24 05:42:26,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=432782.0, ans=0.04949747468305833 2024-09-24 05:42:34,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=432828.6666666667, ans=0.2 2024-09-24 05:42:44,647 INFO [train.py:1198] (3/4) Epoch 24, batch 3150, loss[loss=0.2243, ctc_loss=0.1476, cr_loss=0.3832, over 17142.00 frames. ], tot_loss[loss=0.2075, ctc_loss=0.1368, cr_loss=0.3536, over 3350904.89 frames. ], batch size: 48, lr: 4.95e-03, grad_scale: 16.0 2024-09-24 05:42:57,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=432875.3333333333, ans=0.0 2024-09-24 05:43:24,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=432968.6666666667, ans=0.125 2024-09-24 05:43:39,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=433015.3333333333, ans=0.125 2024-09-24 05:44:02,936 INFO [train.py:1198] (3/4) Epoch 24, batch 3200, loss[loss=0.2653, ctc_loss=0.1857, cr_loss=0.3977, over 11511.00 frames. ], tot_loss[loss=0.207, ctc_loss=0.1365, cr_loss=0.3523, over 3350676.16 frames. ], batch size: 123, lr: 4.95e-03, grad_scale: 32.0 2024-09-24 05:44:10,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=433108.6666666667, ans=0.125 2024-09-24 05:44:12,035 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.252e+02 1.364e+02 1.478e+02 2.406e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-24 05:44:24,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=433155.3333333333, ans=0.0 2024-09-24 05:44:31,104 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 05:44:43,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=433202.0, ans=0.125 2024-09-24 05:44:52,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=12.0 2024-09-24 05:45:11,796 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-09-24 05:45:20,628 INFO [train.py:1198] (3/4) Epoch 24, batch 3250, loss[loss=0.2086, ctc_loss=0.1369, cr_loss=0.3587, over 17225.00 frames. ], tot_loss[loss=0.2075, ctc_loss=0.1369, cr_loss=0.353, over 3337608.62 frames. ], batch size: 50, lr: 4.95e-03, grad_scale: 32.0 2024-09-24 05:45:25,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=433342.0, ans=0.0 2024-09-24 05:45:48,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=433388.6666666667, ans=0.5 2024-09-24 05:45:48,911 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2024-09-24 05:45:52,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=22.5 2024-09-24 05:45:59,472 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 05:46:28,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.35 vs. limit=15.0 2024-09-24 05:46:36,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=433528.6666666667, ans=0.125 2024-09-24 05:46:40,883 INFO [train.py:1198] (3/4) Epoch 24, batch 3300, loss[loss=0.2226, ctc_loss=0.1474, cr_loss=0.376, over 17094.00 frames. ], tot_loss[loss=0.207, ctc_loss=0.1364, cr_loss=0.3527, over 3344634.32 frames. ], batch size: 49, lr: 4.95e-03, grad_scale: 32.0 2024-09-24 05:46:47,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=433575.3333333333, ans=0.95 2024-09-24 05:46:49,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=433575.3333333333, ans=0.2 2024-09-24 05:46:50,402 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.267e+02 1.337e+02 1.528e+02 2.027e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-24 05:46:56,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=433622.0, ans=0.125 2024-09-24 05:47:08,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.90 vs. limit=15.0 2024-09-24 05:47:09,300 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-09-24 05:47:34,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=433715.3333333333, ans=0.125 2024-09-24 05:47:44,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=433762.0, ans=0.025 2024-09-24 05:47:44,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=433762.0, ans=0.125 2024-09-24 05:47:49,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=433762.0, ans=0.0 2024-09-24 05:47:55,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=433762.0, ans=0.1 2024-09-24 05:47:58,604 INFO [train.py:1198] (3/4) Epoch 24, batch 3350, loss[loss=0.2302, ctc_loss=0.1521, cr_loss=0.3906, over 17034.00 frames. ], tot_loss[loss=0.2081, ctc_loss=0.1373, cr_loss=0.3541, over 3343084.98 frames. ], batch size: 44, lr: 4.95e-03, grad_scale: 32.0 2024-09-24 05:48:00,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433808.6666666667, ans=0.1 2024-09-24 05:48:11,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.45 vs. limit=15.0 2024-09-24 05:48:50,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=433948.6666666667, ans=0.035 2024-09-24 05:48:55,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=12.0 2024-09-24 05:48:59,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=433995.3333333333, ans=0.1 2024-09-24 05:49:15,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=434042.0, ans=0.1 2024-09-24 05:49:16,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.16 vs. limit=15.0 2024-09-24 05:49:16,885 INFO [train.py:1198] (3/4) Epoch 24, batch 3400, loss[loss=0.2109, ctc_loss=0.138, cr_loss=0.3646, over 17165.00 frames. ], tot_loss[loss=0.208, ctc_loss=0.1372, cr_loss=0.3539, over 3340890.90 frames. ], batch size: 45, lr: 4.95e-03, grad_scale: 32.0 2024-09-24 05:49:26,483 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.267e+02 1.361e+02 1.503e+02 2.338e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-24 05:49:31,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=434088.6666666667, ans=0.025 2024-09-24 05:49:46,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=434088.6666666667, ans=0.025 2024-09-24 05:49:49,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=434135.3333333333, ans=0.125 2024-09-24 05:49:49,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=434135.3333333333, ans=0.125 2024-09-24 05:49:55,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=434135.3333333333, ans=0.125 2024-09-24 05:50:12,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=434182.0, ans=0.025 2024-09-24 05:50:36,808 INFO [train.py:1198] (3/4) Epoch 24, batch 3450, loss[loss=0.2606, ctc_loss=0.177, cr_loss=0.4179, over 15065.00 frames. ], tot_loss[loss=0.2087, ctc_loss=0.1378, cr_loss=0.3547, over 3331197.52 frames. ], batch size: 89, lr: 4.95e-03, grad_scale: 32.0 2024-09-24 05:51:00,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=434322.0, ans=0.0 2024-09-24 05:51:04,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.11 vs. limit=8.0 2024-09-24 05:51:05,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=434322.0, ans=0.125 2024-09-24 05:51:40,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.29 vs. limit=8.0 2024-09-24 05:51:58,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=434508.6666666667, ans=0.125 2024-09-24 05:51:59,831 INFO [train.py:1198] (3/4) Epoch 24, batch 3500, loss[loss=0.1674, ctc_loss=0.1072, cr_loss=0.3012, over 17078.00 frames. ], tot_loss[loss=0.209, ctc_loss=0.1379, cr_loss=0.3554, over 3339875.89 frames. ], batch size: 40, lr: 4.95e-03, grad_scale: 32.0 2024-09-24 05:52:06,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.87 vs. limit=10.0 2024-09-24 05:52:10,829 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.042e+02 1.254e+02 1.358e+02 1.511e+02 3.142e+02, threshold=2.715e+02, percent-clipped=1.0 2024-09-24 05:52:12,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=434508.6666666667, ans=0.0 2024-09-24 05:52:15,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.98 vs. limit=10.0 2024-09-24 05:53:04,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=434695.3333333333, ans=0.0 2024-09-24 05:53:18,585 INFO [train.py:1198] (3/4) Epoch 24, batch 3550, loss[loss=0.2393, ctc_loss=0.1603, cr_loss=0.3952, over 16099.00 frames. ], tot_loss[loss=0.2086, ctc_loss=0.1376, cr_loss=0.355, over 3353684.64 frames. ], batch size: 74, lr: 4.94e-03, grad_scale: 16.0 2024-09-24 05:53:34,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=434788.6666666667, ans=0.07 2024-09-24 05:53:52,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=434835.3333333333, ans=0.125 2024-09-24 05:54:16,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=434882.0, ans=0.2 2024-09-24 05:54:36,760 INFO [train.py:1198] (3/4) Epoch 24, batch 3600, loss[loss=0.178, ctc_loss=0.1162, cr_loss=0.3091, over 17295.00 frames. ], tot_loss[loss=0.2078, ctc_loss=0.137, cr_loss=0.3539, over 3352618.45 frames. ], batch size: 46, lr: 4.94e-03, grad_scale: 32.0 2024-09-24 05:54:41,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=434975.3333333333, ans=0.1 2024-09-24 05:54:47,632 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.266e+02 1.361e+02 1.484e+02 1.804e+02, threshold=2.723e+02, percent-clipped=0.0 2024-09-24 05:55:11,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=435068.6666666667, ans=0.2 2024-09-24 05:55:33,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=435115.3333333333, ans=0.125 2024-09-24 05:55:57,324 INFO [train.py:1198] (3/4) Epoch 24, batch 3650, loss[loss=0.2284, ctc_loss=0.1529, cr_loss=0.3775, over 16840.00 frames. ], tot_loss[loss=0.2081, ctc_loss=0.1373, cr_loss=0.3539, over 3352650.02 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 32.0 2024-09-24 05:55:59,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.33 vs. limit=15.0 2024-09-24 05:56:52,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2024-09-24 05:57:16,737 INFO [train.py:1198] (3/4) Epoch 24, batch 3700, loss[loss=0.2039, ctc_loss=0.1334, cr_loss=0.3525, over 16889.00 frames. ], tot_loss[loss=0.2082, ctc_loss=0.1373, cr_loss=0.3547, over 3349850.36 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 16.0 2024-09-24 05:57:17,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435442.0, ans=0.1 2024-09-24 05:57:19,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2024-09-24 05:57:29,294 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.270e+02 1.350e+02 1.462e+02 1.892e+02, threshold=2.701e+02, percent-clipped=0.0 2024-09-24 05:57:35,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=435488.6666666667, ans=0.025 2024-09-24 05:58:29,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=435628.6666666667, ans=0.125 2024-09-24 05:58:35,106 INFO [train.py:1198] (3/4) Epoch 24, batch 3750, loss[loss=0.2446, ctc_loss=0.1698, cr_loss=0.3741, over 11737.00 frames. ], tot_loss[loss=0.2082, ctc_loss=0.1373, cr_loss=0.3548, over 3346917.18 frames. ], batch size: 123, lr: 4.94e-03, grad_scale: 16.0 2024-09-24 05:59:43,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=435862.0, ans=0.125 2024-09-24 05:59:48,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=435862.0, ans=0.125 2024-09-24 05:59:54,386 INFO [train.py:1198] (3/4) Epoch 24, batch 3800, loss[loss=0.2659, ctc_loss=0.1835, cr_loss=0.4124, over 11474.00 frames. ], tot_loss[loss=0.2084, ctc_loss=0.1375, cr_loss=0.3547, over 3330039.41 frames. ], batch size: 123, lr: 4.94e-03, grad_scale: 16.0 2024-09-24 06:00:01,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=435908.6666666667, ans=0.125 2024-09-24 06:00:02,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=435908.6666666667, ans=0.125 2024-09-24 06:00:05,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=435908.6666666667, ans=0.1 2024-09-24 06:00:07,113 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.258e+02 1.341e+02 1.479e+02 2.397e+02, threshold=2.682e+02, percent-clipped=0.0 2024-09-24 06:00:12,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=435955.3333333333, ans=0.125 2024-09-24 06:00:19,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=435955.3333333333, ans=10.0 2024-09-24 06:00:29,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=436002.0, ans=0.125 2024-09-24 06:00:35,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=436002.0, ans=0.04949747468305833 2024-09-24 06:01:03,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=436095.3333333333, ans=0.07 2024-09-24 06:01:04,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=436095.3333333333, ans=0.025 2024-09-24 06:01:14,194 INFO [train.py:1198] (3/4) Epoch 24, batch 3850, loss[loss=0.2824, ctc_loss=0.1958, cr_loss=0.4329, over 11618.00 frames. ], tot_loss[loss=0.2081, ctc_loss=0.1373, cr_loss=0.3538, over 3307732.62 frames. ], batch size: 123, lr: 4.94e-03, grad_scale: 16.0 2024-09-24 06:01:19,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=436142.0, ans=0.025 2024-09-24 06:01:54,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=436235.3333333333, ans=0.125 2024-09-24 06:01:56,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=22.5 2024-09-24 06:02:09,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=436282.0, ans=0.5 2024-09-24 06:02:17,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=436328.6666666667, ans=0.0 2024-09-24 06:03:16,547 INFO [train.py:1198] (3/4) Epoch 25, batch 0, loss[loss=0.1995, ctc_loss=0.1294, cr_loss=0.3502, over 17071.00 frames. ], tot_loss[loss=0.1995, ctc_loss=0.1294, cr_loss=0.3502, over 17071.00 frames. ], batch size: 46, lr: 4.83e-03, grad_scale: 32.0 2024-09-24 06:03:16,547 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 06:03:31,924 INFO [train.py:1230] (3/4) Epoch 25, validation: loss=0.03759, ctc_loss=0.03759, cr_loss=8.067e-15, over 944034.00 frames. 2024-09-24 06:03:31,925 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 06:03:51,062 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.314e+02 1.430e+02 1.672e+02 2.033e+02, threshold=2.861e+02, percent-clipped=0.0 2024-09-24 06:04:19,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=436450.0, ans=0.2 2024-09-24 06:04:24,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=436496.6666666667, ans=0.125 2024-09-24 06:04:29,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=436496.6666666667, ans=0.125 2024-09-24 06:04:38,992 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2024-09-24 06:04:48,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=436543.3333333333, ans=0.125 2024-09-24 06:04:54,469 INFO [train.py:1198] (3/4) Epoch 25, batch 50, loss[loss=0.2, ctc_loss=0.1316, cr_loss=0.3422, over 17296.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1354, cr_loss=0.3541, over 761879.72 frames. ], batch size: 49, lr: 4.83e-03, grad_scale: 32.0 2024-09-24 06:05:10,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=436636.6666666667, ans=0.025 2024-09-24 06:06:19,641 INFO [train.py:1198] (3/4) Epoch 25, batch 100, loss[loss=0.2055, ctc_loss=0.1363, cr_loss=0.3462, over 17369.00 frames. ], tot_loss[loss=0.2053, ctc_loss=0.1348, cr_loss=0.3526, over 1333500.61 frames. ], batch size: 48, lr: 4.83e-03, grad_scale: 16.0 2024-09-24 06:06:26,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.68 vs. limit=22.5 2024-09-24 06:06:27,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=436823.3333333333, ans=0.1 2024-09-24 06:06:40,252 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.292e+02 1.373e+02 1.497e+02 2.148e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-24 06:07:28,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=437010.0, ans=0.0 2024-09-24 06:07:39,140 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 06:07:42,090 INFO [train.py:1198] (3/4) Epoch 25, batch 150, loss[loss=0.1993, ctc_loss=0.1304, cr_loss=0.3444, over 17292.00 frames. ], tot_loss[loss=0.205, ctc_loss=0.1346, cr_loss=0.3521, over 1787583.50 frames. ], batch size: 46, lr: 4.83e-03, grad_scale: 16.0 2024-09-24 06:08:33,321 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 06:08:42,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=437196.6666666667, ans=0.125 2024-09-24 06:08:48,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437243.3333333333, ans=0.1 2024-09-24 06:08:48,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=437243.3333333333, ans=0.0 2024-09-24 06:08:50,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=437243.3333333333, ans=0.05 2024-09-24 06:09:04,126 INFO [train.py:1198] (3/4) Epoch 25, batch 200, loss[loss=0.1798, ctc_loss=0.1168, cr_loss=0.3149, over 17204.00 frames. ], tot_loss[loss=0.207, ctc_loss=0.1361, cr_loss=0.3542, over 2137502.65 frames. ], batch size: 41, lr: 4.83e-03, grad_scale: 8.0 2024-09-24 06:09:12,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=437290.0, ans=0.2 2024-09-24 06:09:26,583 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.223e+02 1.322e+02 1.442e+02 1.903e+02, threshold=2.645e+02, percent-clipped=0.0 2024-09-24 06:09:26,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=437336.6666666667, ans=0.0 2024-09-24 06:10:05,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=437430.0, ans=0.2 2024-09-24 06:10:24,170 INFO [train.py:1198] (3/4) Epoch 25, batch 250, loss[loss=0.202, ctc_loss=0.1297, cr_loss=0.3614, over 17072.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1382, cr_loss=0.3565, over 2388024.57 frames. ], batch size: 46, lr: 4.83e-03, grad_scale: 8.0 2024-09-24 06:11:04,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2024-09-24 06:11:11,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=437616.6666666667, ans=0.2 2024-09-24 06:11:22,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=437663.3333333333, ans=0.0 2024-09-24 06:11:36,265 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2024-09-24 06:11:38,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=437710.0, ans=0.125 2024-09-24 06:11:49,772 INFO [train.py:1198] (3/4) Epoch 25, batch 300, loss[loss=0.1809, ctc_loss=0.1182, cr_loss=0.3132, over 17040.00 frames. ], tot_loss[loss=0.2097, ctc_loss=0.1384, cr_loss=0.3564, over 2606339.74 frames. ], batch size: 39, lr: 4.83e-03, grad_scale: 8.0 2024-09-24 06:11:54,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=437756.6666666667, ans=0.0 2024-09-24 06:12:04,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=437803.3333333333, ans=0.025 2024-09-24 06:12:09,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=437803.3333333333, ans=0.1 2024-09-24 06:12:12,131 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.257e+02 1.339e+02 1.438e+02 1.926e+02, threshold=2.678e+02, percent-clipped=0.0 2024-09-24 06:12:17,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=12.0 2024-09-24 06:12:48,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=437896.6666666667, ans=0.125 2024-09-24 06:12:53,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.83 vs. limit=10.0 2024-09-24 06:13:01,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.29 vs. limit=6.0 2024-09-24 06:13:12,247 INFO [train.py:1198] (3/4) Epoch 25, batch 350, loss[loss=0.2372, ctc_loss=0.1625, cr_loss=0.3734, over 11578.00 frames. ], tot_loss[loss=0.2081, ctc_loss=0.1372, cr_loss=0.3548, over 2775152.98 frames. ], batch size: 123, lr: 4.82e-03, grad_scale: 8.0 2024-09-24 06:13:12,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=437990.0, ans=0.2 2024-09-24 06:13:21,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=437990.0, ans=0.0 2024-09-24 06:13:25,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=437990.0, ans=0.125 2024-09-24 06:13:26,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=438036.6666666667, ans=0.0 2024-09-24 06:13:27,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-09-24 06:13:30,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=438036.6666666667, ans=0.09899494936611666 2024-09-24 06:13:41,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.71 vs. limit=10.0 2024-09-24 06:13:54,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-24 06:14:01,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=438130.0, ans=0.125 2024-09-24 06:14:13,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.25 vs. limit=6.0 2024-09-24 06:14:35,227 INFO [train.py:1198] (3/4) Epoch 25, batch 400, loss[loss=0.1736, ctc_loss=0.1104, cr_loss=0.3158, over 17252.00 frames. ], tot_loss[loss=0.2077, ctc_loss=0.1368, cr_loss=0.3546, over 2899792.37 frames. ], batch size: 44, lr: 4.82e-03, grad_scale: 16.0 2024-09-24 06:14:37,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=438223.3333333333, ans=0.0 2024-09-24 06:14:40,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2024-09-24 06:14:54,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=438270.0, ans=0.125 2024-09-24 06:14:57,569 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.241e+02 1.339e+02 1.521e+02 2.224e+02, threshold=2.677e+02, percent-clipped=0.0 2024-09-24 06:14:58,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=438270.0, ans=0.125 2024-09-24 06:15:09,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=438316.6666666667, ans=15.0 2024-09-24 06:15:39,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=438410.0, ans=0.0 2024-09-24 06:15:43,217 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.14 vs. limit=22.5 2024-09-24 06:15:57,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.38 vs. limit=15.0 2024-09-24 06:15:57,610 INFO [train.py:1198] (3/4) Epoch 25, batch 450, loss[loss=0.2202, ctc_loss=0.146, cr_loss=0.371, over 17204.00 frames. ], tot_loss[loss=0.2075, ctc_loss=0.1366, cr_loss=0.3544, over 2999856.37 frames. ], batch size: 55, lr: 4.82e-03, grad_scale: 16.0 2024-09-24 06:16:30,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.50 vs. limit=15.0 2024-09-24 06:16:44,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=438550.0, ans=0.0 2024-09-24 06:16:47,700 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.41 vs. limit=15.0 2024-09-24 06:16:53,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=438596.6666666667, ans=0.125 2024-09-24 06:17:20,719 INFO [train.py:1198] (3/4) Epoch 25, batch 500, loss[loss=0.1798, ctc_loss=0.1154, cr_loss=0.3224, over 17011.00 frames. ], tot_loss[loss=0.2065, ctc_loss=0.1359, cr_loss=0.3531, over 3080362.61 frames. ], batch size: 39, lr: 4.82e-03, grad_scale: 16.0 2024-09-24 06:17:46,420 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.265e+02 1.364e+02 1.484e+02 2.816e+02, threshold=2.728e+02, percent-clipped=1.0 2024-09-24 06:18:14,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=438830.0, ans=0.1 2024-09-24 06:18:25,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=438830.0, ans=0.1 2024-09-24 06:18:38,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=438876.6666666667, ans=0.125 2024-09-24 06:18:44,383 INFO [train.py:1198] (3/4) Epoch 25, batch 550, loss[loss=0.2207, ctc_loss=0.1485, cr_loss=0.3611, over 17351.00 frames. ], tot_loss[loss=0.2065, ctc_loss=0.1359, cr_loss=0.3532, over 3130952.45 frames. ], batch size: 48, lr: 4.82e-03, grad_scale: 16.0 2024-09-24 06:18:44,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=438923.3333333333, ans=0.2 2024-09-24 06:18:53,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=438923.3333333333, ans=0.025 2024-09-24 06:19:03,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=438970.0, ans=0.125 2024-09-24 06:19:05,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=438970.0, ans=0.0 2024-09-24 06:19:08,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=438970.0, ans=0.125 2024-09-24 06:19:41,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=439063.3333333333, ans=0.1 2024-09-24 06:19:44,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=439063.3333333333, ans=0.125 2024-09-24 06:19:49,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=439110.0, ans=0.0 2024-09-24 06:20:00,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=439110.0, ans=0.1 2024-09-24 06:20:06,960 INFO [train.py:1198] (3/4) Epoch 25, batch 600, loss[loss=0.2061, ctc_loss=0.1341, cr_loss=0.36, over 17119.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1354, cr_loss=0.3519, over 3179022.45 frames. ], batch size: 49, lr: 4.82e-03, grad_scale: 16.0 2024-09-24 06:20:29,579 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.242e+02 1.338e+02 1.453e+02 1.774e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-24 06:20:37,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=439250.0, ans=0.1 2024-09-24 06:20:42,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=439250.0, ans=0.0 2024-09-24 06:20:48,109 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.70 vs. limit=10.0 2024-09-24 06:21:06,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=439296.6666666667, ans=0.0 2024-09-24 06:21:15,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.27 vs. limit=15.0 2024-09-24 06:21:25,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=439343.3333333333, ans=0.125 2024-09-24 06:21:29,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=439343.3333333333, ans=0.125 2024-09-24 06:21:32,782 INFO [train.py:1198] (3/4) Epoch 25, batch 650, loss[loss=0.1821, ctc_loss=0.1179, cr_loss=0.3211, over 17160.00 frames. ], tot_loss[loss=0.2063, ctc_loss=0.1357, cr_loss=0.3529, over 3219105.38 frames. ], batch size: 45, lr: 4.82e-03, grad_scale: 16.0 2024-09-24 06:21:47,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=439436.6666666667, ans=0.0 2024-09-24 06:21:48,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=439436.6666666667, ans=0.125 2024-09-24 06:21:53,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-09-24 06:22:07,730 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 06:22:40,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=439576.6666666667, ans=0.2 2024-09-24 06:22:47,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=439576.6666666667, ans=0.125 2024-09-24 06:22:54,952 INFO [train.py:1198] (3/4) Epoch 25, batch 700, loss[loss=0.2055, ctc_loss=0.1326, cr_loss=0.3645, over 17230.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.1364, cr_loss=0.3534, over 3233436.25 frames. ], batch size: 47, lr: 4.82e-03, grad_scale: 16.0 2024-09-24 06:23:06,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=439623.3333333333, ans=0.2 2024-09-24 06:23:17,411 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.239e+02 1.345e+02 1.493e+02 2.005e+02, threshold=2.689e+02, percent-clipped=0.0 2024-09-24 06:23:19,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=439670.0, ans=0.0 2024-09-24 06:23:20,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=439670.0, ans=0.0 2024-09-24 06:23:41,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=439763.3333333333, ans=0.0 2024-09-24 06:23:41,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=439763.3333333333, ans=0.125 2024-09-24 06:23:50,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=439763.3333333333, ans=0.2 2024-09-24 06:24:09,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=439810.0, ans=0.2 2024-09-24 06:24:11,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=439810.0, ans=0.125 2024-09-24 06:24:17,400 INFO [train.py:1198] (3/4) Epoch 25, batch 750, loss[loss=0.2029, ctc_loss=0.1339, cr_loss=0.345, over 17086.00 frames. ], tot_loss[loss=0.2073, ctc_loss=0.1365, cr_loss=0.3539, over 3265872.80 frames. ], batch size: 49, lr: 4.81e-03, grad_scale: 16.0 2024-09-24 06:24:56,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=439950.0, ans=0.125 2024-09-24 06:25:02,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=439950.0, ans=0.125 2024-09-24 06:25:23,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=440043.3333333333, ans=0.0 2024-09-24 06:25:37,714 INFO [train.py:1198] (3/4) Epoch 25, batch 800, loss[loss=0.2111, ctc_loss=0.1375, cr_loss=0.3678, over 17359.00 frames. ], tot_loss[loss=0.206, ctc_loss=0.1355, cr_loss=0.3524, over 3287761.41 frames. ], batch size: 48, lr: 4.81e-03, grad_scale: 32.0 2024-09-24 06:25:41,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440090.0, ans=0.1 2024-09-24 06:25:42,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=440090.0, ans=0.025 2024-09-24 06:25:59,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=440136.6666666667, ans=0.125 2024-09-24 06:26:02,615 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.255e+02 1.320e+02 1.414e+02 2.395e+02, threshold=2.641e+02, percent-clipped=0.0 2024-09-24 06:26:13,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=440183.3333333333, ans=0.05 2024-09-24 06:26:44,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=440230.0, ans=0.0 2024-09-24 06:26:48,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=440276.6666666667, ans=0.1 2024-09-24 06:27:03,009 INFO [train.py:1198] (3/4) Epoch 25, batch 850, loss[loss=0.2421, ctc_loss=0.1597, cr_loss=0.4118, over 15986.00 frames. ], tot_loss[loss=0.2079, ctc_loss=0.137, cr_loss=0.3544, over 3303932.02 frames. ], batch size: 74, lr: 4.81e-03, grad_scale: 32.0 2024-09-24 06:27:14,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440323.3333333333, ans=0.1 2024-09-24 06:27:16,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=12.0 2024-09-24 06:27:26,168 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.63 vs. limit=15.0 2024-09-24 06:27:41,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=440416.6666666667, ans=0.0 2024-09-24 06:27:57,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-09-24 06:28:03,783 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2024-09-24 06:28:06,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=440463.3333333333, ans=0.125 2024-09-24 06:28:14,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=440510.0, ans=0.0 2024-09-24 06:28:14,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=440510.0, ans=0.1 2024-09-24 06:28:15,583 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-09-24 06:28:26,005 INFO [train.py:1198] (3/4) Epoch 25, batch 900, loss[loss=0.204, ctc_loss=0.1316, cr_loss=0.3619, over 17312.00 frames. ], tot_loss[loss=0.2084, ctc_loss=0.1374, cr_loss=0.3552, over 3316590.96 frames. ], batch size: 49, lr: 4.81e-03, grad_scale: 32.0 2024-09-24 06:28:26,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=15.0 2024-09-24 06:28:50,995 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.308e+02 1.404e+02 1.529e+02 2.023e+02, threshold=2.807e+02, percent-clipped=0.0 2024-09-24 06:28:56,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=440603.3333333333, ans=0.2 2024-09-24 06:29:18,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=440696.6666666667, ans=0.125 2024-09-24 06:29:47,843 INFO [train.py:1198] (3/4) Epoch 25, batch 950, loss[loss=0.2184, ctc_loss=0.1453, cr_loss=0.3655, over 17027.00 frames. ], tot_loss[loss=0.208, ctc_loss=0.1371, cr_loss=0.3544, over 3319913.11 frames. ], batch size: 51, lr: 4.81e-03, grad_scale: 16.0 2024-09-24 06:29:48,278 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 06:29:49,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=440790.0, ans=0.0 2024-09-24 06:30:28,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440883.3333333333, ans=0.1 2024-09-24 06:30:31,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=440883.3333333333, ans=0.125 2024-09-24 06:30:45,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=22.5 2024-09-24 06:30:51,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=440976.6666666667, ans=0.125 2024-09-24 06:30:52,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.01 vs. limit=10.0 2024-09-24 06:31:12,905 INFO [train.py:1198] (3/4) Epoch 25, batch 1000, loss[loss=0.215, ctc_loss=0.1462, cr_loss=0.344, over 15986.00 frames. ], tot_loss[loss=0.2078, ctc_loss=0.137, cr_loss=0.3539, over 3325105.91 frames. ], batch size: 74, lr: 4.81e-03, grad_scale: 16.0 2024-09-24 06:31:24,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=441023.3333333333, ans=0.1 2024-09-24 06:31:36,534 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.287e+02 1.404e+02 1.496e+02 1.832e+02, threshold=2.807e+02, percent-clipped=0.0 2024-09-24 06:31:48,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=441116.6666666667, ans=0.0 2024-09-24 06:31:52,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=441116.6666666667, ans=0.05 2024-09-24 06:32:01,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=441163.3333333333, ans=0.1 2024-09-24 06:32:15,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=441210.0, ans=0.04949747468305833 2024-09-24 06:32:16,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=441210.0, ans=0.125 2024-09-24 06:32:25,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=441210.0, ans=0.125 2024-09-24 06:32:25,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=441210.0, ans=0.0 2024-09-24 06:32:32,884 INFO [train.py:1198] (3/4) Epoch 25, batch 1050, loss[loss=0.2792, ctc_loss=0.1937, cr_loss=0.4277, over 17217.00 frames. ], tot_loss[loss=0.2091, ctc_loss=0.138, cr_loss=0.3556, over 3334904.34 frames. ], batch size: 55, lr: 4.81e-03, grad_scale: 16.0 2024-09-24 06:32:42,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=441256.6666666667, ans=0.0 2024-09-24 06:33:22,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=441396.6666666667, ans=0.0 2024-09-24 06:33:35,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=441396.6666666667, ans=0.0 2024-09-24 06:33:35,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=441396.6666666667, ans=0.125 2024-09-24 06:33:35,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=441396.6666666667, ans=0.125 2024-09-24 06:33:58,447 INFO [train.py:1198] (3/4) Epoch 25, batch 1100, loss[loss=0.1836, ctc_loss=0.1201, cr_loss=0.3174, over 17276.00 frames. ], tot_loss[loss=0.2085, ctc_loss=0.1376, cr_loss=0.3544, over 3338497.66 frames. ], batch size: 42, lr: 4.81e-03, grad_scale: 16.0 2024-09-24 06:34:13,046 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 06:34:22,528 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.251e+02 1.334e+02 1.435e+02 1.725e+02, threshold=2.668e+02, percent-clipped=0.0 2024-09-24 06:34:22,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=441536.6666666667, ans=0.125 2024-09-24 06:34:34,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=441583.3333333333, ans=0.1 2024-09-24 06:34:43,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=441583.3333333333, ans=0.1 2024-09-24 06:34:49,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=441630.0, ans=0.0 2024-09-24 06:35:18,624 INFO [train.py:1198] (3/4) Epoch 25, batch 1150, loss[loss=0.2114, ctc_loss=0.1381, cr_loss=0.3663, over 17104.00 frames. ], tot_loss[loss=0.2083, ctc_loss=0.1374, cr_loss=0.3543, over 3348108.31 frames. ], batch size: 49, lr: 4.80e-03, grad_scale: 16.0 2024-09-24 06:35:30,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=441723.3333333333, ans=0.0 2024-09-24 06:35:32,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2024-09-24 06:35:34,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=441770.0, ans=0.125 2024-09-24 06:35:42,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=441770.0, ans=0.2 2024-09-24 06:35:59,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=441816.6666666667, ans=0.125 2024-09-24 06:36:28,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=441910.0, ans=0.125 2024-09-24 06:36:29,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.28 vs. limit=15.0 2024-09-24 06:36:41,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=441956.6666666667, ans=0.025 2024-09-24 06:36:42,701 INFO [train.py:1198] (3/4) Epoch 25, batch 1200, loss[loss=0.2496, ctc_loss=0.1707, cr_loss=0.3943, over 15036.00 frames. ], tot_loss[loss=0.2077, ctc_loss=0.1369, cr_loss=0.3539, over 3348423.06 frames. ], batch size: 89, lr: 4.80e-03, grad_scale: 32.0 2024-09-24 06:37:06,723 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.253e+02 1.340e+02 1.429e+02 1.909e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-24 06:37:31,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442096.6666666667, ans=0.1 2024-09-24 06:38:05,403 INFO [train.py:1198] (3/4) Epoch 25, batch 1250, loss[loss=0.1959, ctc_loss=0.1322, cr_loss=0.3188, over 17228.00 frames. ], tot_loss[loss=0.2078, ctc_loss=0.137, cr_loss=0.3539, over 3342586.57 frames. ], batch size: 50, lr: 4.80e-03, grad_scale: 32.0 2024-09-24 06:38:49,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=442283.3333333333, ans=0.0 2024-09-24 06:38:52,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=442283.3333333333, ans=0.2 2024-09-24 06:38:58,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=442330.0, ans=0.0 2024-09-24 06:39:07,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.08 vs. limit=12.0 2024-09-24 06:39:24,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=442376.6666666667, ans=0.125 2024-09-24 06:39:27,926 INFO [train.py:1198] (3/4) Epoch 25, batch 1300, loss[loss=0.2563, ctc_loss=0.1777, cr_loss=0.3932, over 11479.00 frames. ], tot_loss[loss=0.207, ctc_loss=0.1365, cr_loss=0.3525, over 3332698.36 frames. ], batch size: 123, lr: 4.80e-03, grad_scale: 32.0 2024-09-24 06:39:37,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=442423.3333333333, ans=0.1 2024-09-24 06:39:38,195 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=22.5 2024-09-24 06:39:44,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=442470.0, ans=0.2 2024-09-24 06:39:45,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=442470.0, ans=0.1 2024-09-24 06:39:51,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=442470.0, ans=0.125 2024-09-24 06:39:53,306 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.277e+02 1.396e+02 1.533e+02 2.196e+02, threshold=2.791e+02, percent-clipped=0.0 2024-09-24 06:39:55,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=442470.0, ans=0.0 2024-09-24 06:39:58,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=442516.6666666667, ans=0.125 2024-09-24 06:40:13,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.94 vs. limit=10.0 2024-09-24 06:40:22,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=442563.3333333333, ans=0.125 2024-09-24 06:40:38,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2024-09-24 06:40:44,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=442610.0, ans=0.0 2024-09-24 06:40:47,420 INFO [train.py:1198] (3/4) Epoch 25, batch 1350, loss[loss=0.2467, ctc_loss=0.1672, cr_loss=0.3973, over 17232.00 frames. ], tot_loss[loss=0.2066, ctc_loss=0.1362, cr_loss=0.352, over 3337597.21 frames. ], batch size: 55, lr: 4.80e-03, grad_scale: 16.0 2024-09-24 06:41:10,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=442703.3333333333, ans=0.125 2024-09-24 06:41:45,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=442796.6666666667, ans=0.05 2024-09-24 06:41:50,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=442796.6666666667, ans=0.125 2024-09-24 06:42:02,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=442843.3333333333, ans=0.0 2024-09-24 06:42:12,273 INFO [train.py:1198] (3/4) Epoch 25, batch 1400, loss[loss=0.2245, ctc_loss=0.1521, cr_loss=0.3621, over 17306.00 frames. ], tot_loss[loss=0.2059, ctc_loss=0.1355, cr_loss=0.3518, over 3354127.80 frames. ], batch size: 46, lr: 4.80e-03, grad_scale: 16.0 2024-09-24 06:42:30,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=442936.6666666667, ans=0.125 2024-09-24 06:42:40,063 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.265e+02 1.378e+02 1.497e+02 2.360e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-24 06:42:40,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=442936.6666666667, ans=10.0 2024-09-24 06:42:43,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=442936.6666666667, ans=0.125 2024-09-24 06:42:53,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=442983.3333333333, ans=0.125 2024-09-24 06:42:59,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=442983.3333333333, ans=0.0 2024-09-24 06:43:04,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.96 vs. limit=6.0 2024-09-24 06:43:06,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2024-09-24 06:43:29,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=443076.6666666667, ans=0.125 2024-09-24 06:43:35,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=443123.3333333333, ans=0.0 2024-09-24 06:43:36,561 INFO [train.py:1198] (3/4) Epoch 25, batch 1450, loss[loss=0.2127, ctc_loss=0.138, cr_loss=0.3736, over 17025.00 frames. ], tot_loss[loss=0.2075, ctc_loss=0.1366, cr_loss=0.3545, over 3360281.67 frames. ], batch size: 53, lr: 4.80e-03, grad_scale: 16.0 2024-09-24 06:43:43,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443123.3333333333, ans=0.1 2024-09-24 06:43:46,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=443123.3333333333, ans=0.2 2024-09-24 06:44:02,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2024-09-24 06:44:51,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=443310.0, ans=0.025 2024-09-24 06:44:53,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=443310.0, ans=0.09899494936611666 2024-09-24 06:44:55,968 INFO [train.py:1198] (3/4) Epoch 25, batch 1500, loss[loss=0.1954, ctc_loss=0.1269, cr_loss=0.3425, over 17291.00 frames. ], tot_loss[loss=0.2063, ctc_loss=0.1358, cr_loss=0.3526, over 3359885.92 frames. ], batch size: 51, lr: 4.80e-03, grad_scale: 16.0 2024-09-24 06:45:18,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443403.3333333333, ans=0.1 2024-09-24 06:45:21,556 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.238e+02 1.333e+02 1.437e+02 1.693e+02, threshold=2.665e+02, percent-clipped=0.0 2024-09-24 06:45:34,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.52 vs. limit=22.5 2024-09-24 06:45:44,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=443496.6666666667, ans=0.0 2024-09-24 06:45:47,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.43 vs. limit=15.0 2024-09-24 06:46:06,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=443543.3333333333, ans=0.0 2024-09-24 06:46:21,175 INFO [train.py:1198] (3/4) Epoch 25, batch 1550, loss[loss=0.2082, ctc_loss=0.1382, cr_loss=0.3502, over 17166.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1357, cr_loss=0.3524, over 3356989.47 frames. ], batch size: 45, lr: 4.79e-03, grad_scale: 16.0 2024-09-24 06:46:32,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=443590.0, ans=0.0 2024-09-24 06:46:42,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443636.6666666667, ans=0.1 2024-09-24 06:47:11,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=443730.0, ans=0.125 2024-09-24 06:47:22,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=443730.0, ans=0.025 2024-09-24 06:47:36,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=443776.6666666667, ans=0.125 2024-09-24 06:47:39,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=443776.6666666667, ans=0.125 2024-09-24 06:47:43,800 INFO [train.py:1198] (3/4) Epoch 25, batch 1600, loss[loss=0.1865, ctc_loss=0.1216, cr_loss=0.3247, over 17148.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.1359, cr_loss=0.3528, over 3353008.29 frames. ], batch size: 48, lr: 4.79e-03, grad_scale: 32.0 2024-09-24 06:48:01,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=443870.0, ans=0.125 2024-09-24 06:48:09,691 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.061e+02 1.242e+02 1.313e+02 1.472e+02 2.185e+02, threshold=2.626e+02, percent-clipped=0.0 2024-09-24 06:48:10,131 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 06:49:05,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=444056.6666666667, ans=0.125 2024-09-24 06:49:06,446 INFO [train.py:1198] (3/4) Epoch 25, batch 1650, loss[loss=0.1902, ctc_loss=0.1238, cr_loss=0.3322, over 17063.00 frames. ], tot_loss[loss=0.2073, ctc_loss=0.1364, cr_loss=0.3545, over 3360504.27 frames. ], batch size: 46, lr: 4.79e-03, grad_scale: 32.0 2024-09-24 06:49:52,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=444196.6666666667, ans=0.95 2024-09-24 06:50:07,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=444196.6666666667, ans=0.0 2024-09-24 06:50:26,224 INFO [train.py:1198] (3/4) Epoch 25, batch 1700, loss[loss=0.2029, ctc_loss=0.1361, cr_loss=0.3341, over 16783.00 frames. ], tot_loss[loss=0.2075, ctc_loss=0.1366, cr_loss=0.3542, over 3364733.66 frames. ], batch size: 61, lr: 4.79e-03, grad_scale: 32.0 2024-09-24 06:50:31,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=444290.0, ans=0.0 2024-09-24 06:50:36,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=444290.0, ans=0.125 2024-09-24 06:50:42,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=444336.6666666667, ans=0.125 2024-09-24 06:50:54,229 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.301e+02 1.388e+02 1.513e+02 1.890e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-24 06:51:10,357 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.05 vs. limit=15.0 2024-09-24 06:51:14,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=444383.3333333333, ans=0.2 2024-09-24 06:51:32,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=444430.0, ans=0.0 2024-09-24 06:51:40,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=444476.6666666667, ans=0.025 2024-09-24 06:51:51,035 INFO [train.py:1198] (3/4) Epoch 25, batch 1750, loss[loss=0.2445, ctc_loss=0.1724, cr_loss=0.3607, over 11708.00 frames. ], tot_loss[loss=0.2073, ctc_loss=0.1366, cr_loss=0.3535, over 3355142.69 frames. ], batch size: 123, lr: 4.79e-03, grad_scale: 32.0 2024-09-24 06:52:05,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=444570.0, ans=0.0 2024-09-24 06:52:26,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=444616.6666666667, ans=0.0 2024-09-24 06:52:35,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=444616.6666666667, ans=0.0 2024-09-24 06:52:52,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=444663.3333333333, ans=0.025 2024-09-24 06:53:01,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=444710.0, ans=0.0 2024-09-24 06:53:07,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=15.0 2024-09-24 06:53:12,787 INFO [train.py:1198] (3/4) Epoch 25, batch 1800, loss[loss=0.2209, ctc_loss=0.1461, cr_loss=0.3742, over 17291.00 frames. ], tot_loss[loss=0.2072, ctc_loss=0.1366, cr_loss=0.353, over 3346905.03 frames. ], batch size: 49, lr: 4.79e-03, grad_scale: 32.0 2024-09-24 06:53:16,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=444756.6666666667, ans=0.0 2024-09-24 06:53:16,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=444756.6666666667, ans=0.1 2024-09-24 06:53:40,997 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.233e+02 1.324e+02 1.428e+02 1.805e+02, threshold=2.648e+02, percent-clipped=0.0 2024-09-24 06:53:43,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=444803.3333333333, ans=0.125 2024-09-24 06:54:00,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=444850.0, ans=0.5 2024-09-24 06:54:11,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=444896.6666666667, ans=0.0 2024-09-24 06:54:25,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2024-09-24 06:54:26,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=444943.3333333333, ans=0.0 2024-09-24 06:54:35,387 INFO [train.py:1198] (3/4) Epoch 25, batch 1850, loss[loss=0.2155, ctc_loss=0.1436, cr_loss=0.3595, over 17012.00 frames. ], tot_loss[loss=0.2055, ctc_loss=0.1353, cr_loss=0.3514, over 3354622.17 frames. ], batch size: 53, lr: 4.79e-03, grad_scale: 32.0 2024-09-24 06:54:43,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=444990.0, ans=0.1 2024-09-24 06:55:11,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=445083.3333333333, ans=0.05 2024-09-24 06:55:11,589 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2024-09-24 06:55:52,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=445176.6666666667, ans=0.125 2024-09-24 06:56:01,116 INFO [train.py:1198] (3/4) Epoch 25, batch 1900, loss[loss=0.2072, ctc_loss=0.1356, cr_loss=0.3579, over 17156.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1357, cr_loss=0.3522, over 3365645.67 frames. ], batch size: 45, lr: 4.79e-03, grad_scale: 32.0 2024-09-24 06:56:02,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=445223.3333333333, ans=0.0 2024-09-24 06:56:26,824 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.232e+02 1.312e+02 1.425e+02 2.956e+02, threshold=2.624e+02, percent-clipped=1.0 2024-09-24 06:56:41,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=445316.6666666667, ans=0.09899494936611666 2024-09-24 06:56:44,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=445316.6666666667, ans=0.2 2024-09-24 06:56:50,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=445363.3333333333, ans=0.125 2024-09-24 06:56:52,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=445363.3333333333, ans=0.125 2024-09-24 06:56:57,304 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 06:56:57,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=445363.3333333333, ans=0.2 2024-09-24 06:57:20,855 INFO [train.py:1198] (3/4) Epoch 25, batch 1950, loss[loss=0.1778, ctc_loss=0.1171, cr_loss=0.3032, over 17191.00 frames. ], tot_loss[loss=0.2052, ctc_loss=0.1352, cr_loss=0.3502, over 3370580.36 frames. ], batch size: 41, lr: 4.78e-03, grad_scale: 32.0 2024-09-24 06:57:21,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=445456.6666666667, ans=0.0 2024-09-24 06:57:26,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=445456.6666666667, ans=0.125 2024-09-24 06:57:49,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445503.3333333333, ans=0.1 2024-09-24 06:57:52,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=445503.3333333333, ans=0.125 2024-09-24 06:58:23,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445596.6666666667, ans=0.1 2024-09-24 06:58:29,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=15.0 2024-09-24 06:58:35,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=445643.3333333333, ans=0.0 2024-09-24 06:58:46,271 INFO [train.py:1198] (3/4) Epoch 25, batch 2000, loss[loss=0.2344, ctc_loss=0.1627, cr_loss=0.3585, over 11359.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1356, cr_loss=0.3513, over 3361640.56 frames. ], batch size: 123, lr: 4.78e-03, grad_scale: 32.0 2024-09-24 06:59:11,911 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.272e+02 1.363e+02 1.482e+02 1.870e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-24 06:59:23,791 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2024-09-24 06:59:52,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=445876.6666666667, ans=0.125 2024-09-24 07:00:06,180 INFO [train.py:1198] (3/4) Epoch 25, batch 2050, loss[loss=0.1847, ctc_loss=0.1173, cr_loss=0.3368, over 17269.00 frames. ], tot_loss[loss=0.2065, ctc_loss=0.1361, cr_loss=0.352, over 3356172.34 frames. ], batch size: 42, lr: 4.78e-03, grad_scale: 32.0 2024-09-24 07:00:16,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=445923.3333333333, ans=0.05 2024-09-24 07:01:00,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=22.5 2024-09-24 07:01:31,630 INFO [train.py:1198] (3/4) Epoch 25, batch 2100, loss[loss=0.2068, ctc_loss=0.1369, cr_loss=0.3491, over 17304.00 frames. ], tot_loss[loss=0.2067, ctc_loss=0.1362, cr_loss=0.3527, over 3358988.38 frames. ], batch size: 46, lr: 4.78e-03, grad_scale: 32.0 2024-09-24 07:01:56,975 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.289e+02 1.372e+02 1.523e+02 2.426e+02, threshold=2.744e+02, percent-clipped=0.0 2024-09-24 07:01:58,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=446203.3333333333, ans=0.0 2024-09-24 07:02:08,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=446250.0, ans=0.1 2024-09-24 07:02:08,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=446250.0, ans=0.0 2024-09-24 07:02:16,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=446250.0, ans=0.0 2024-09-24 07:02:21,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=446296.6666666667, ans=0.0 2024-09-24 07:02:30,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=446296.6666666667, ans=0.2 2024-09-24 07:02:54,495 INFO [train.py:1198] (3/4) Epoch 25, batch 2150, loss[loss=0.2319, ctc_loss=0.1555, cr_loss=0.3822, over 17359.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.1362, cr_loss=0.3531, over 3362667.17 frames. ], batch size: 48, lr: 4.78e-03, grad_scale: 32.0 2024-09-24 07:02:54,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=446390.0, ans=0.125 2024-09-24 07:03:05,034 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.47 vs. limit=22.5 2024-09-24 07:03:30,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=12.0 2024-09-24 07:03:30,214 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2024-09-24 07:03:32,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=446483.3333333333, ans=0.2 2024-09-24 07:03:40,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=446483.3333333333, ans=0.0 2024-09-24 07:03:51,530 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.90 vs. limit=5.0 2024-09-24 07:03:53,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=446530.0, ans=0.1 2024-09-24 07:03:55,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=446530.0, ans=0.0 2024-09-24 07:04:14,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=446576.6666666667, ans=0.125 2024-09-24 07:04:17,543 INFO [train.py:1198] (3/4) Epoch 25, batch 2200, loss[loss=0.2162, ctc_loss=0.1414, cr_loss=0.374, over 17234.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.1359, cr_loss=0.3526, over 3365989.87 frames. ], batch size: 50, lr: 4.78e-03, grad_scale: 16.0 2024-09-24 07:04:25,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=446623.3333333333, ans=0.1 2024-09-24 07:04:44,642 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.273e+02 1.353e+02 1.559e+02 2.204e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-24 07:05:02,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=446716.6666666667, ans=0.2 2024-09-24 07:05:25,384 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 07:05:33,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=446810.0, ans=0.125 2024-09-24 07:05:37,840 INFO [train.py:1198] (3/4) Epoch 25, batch 2250, loss[loss=0.1901, ctc_loss=0.1276, cr_loss=0.3125, over 17169.00 frames. ], tot_loss[loss=0.2068, ctc_loss=0.1362, cr_loss=0.3525, over 3358000.25 frames. ], batch size: 45, lr: 4.78e-03, grad_scale: 16.0 2024-09-24 07:05:44,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2024-09-24 07:06:53,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=447043.3333333333, ans=0.0 2024-09-24 07:07:03,316 INFO [train.py:1198] (3/4) Epoch 25, batch 2300, loss[loss=0.1792, ctc_loss=0.1191, cr_loss=0.3003, over 17294.00 frames. ], tot_loss[loss=0.2066, ctc_loss=0.1362, cr_loss=0.3522, over 3353804.21 frames. ], batch size: 46, lr: 4.78e-03, grad_scale: 16.0 2024-09-24 07:07:06,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=447090.0, ans=0.0 2024-09-24 07:07:11,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=15.0 2024-09-24 07:07:30,512 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.028e+02 1.259e+02 1.322e+02 1.446e+02 1.963e+02, threshold=2.643e+02, percent-clipped=0.0 2024-09-24 07:07:36,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=447183.3333333333, ans=0.2 2024-09-24 07:07:49,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=22.5 2024-09-24 07:07:50,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=447183.3333333333, ans=0.2 2024-09-24 07:07:59,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=15.0 2024-09-24 07:08:02,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=447230.0, ans=0.2 2024-09-24 07:08:10,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=447276.6666666667, ans=0.025 2024-09-24 07:08:28,420 INFO [train.py:1198] (3/4) Epoch 25, batch 2350, loss[loss=0.2057, ctc_loss=0.1355, cr_loss=0.3507, over 17344.00 frames. ], tot_loss[loss=0.2082, ctc_loss=0.1374, cr_loss=0.3542, over 3344835.47 frames. ], batch size: 48, lr: 4.77e-03, grad_scale: 16.0 2024-09-24 07:08:33,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=447323.3333333333, ans=0.2 2024-09-24 07:08:40,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=447323.3333333333, ans=6.0 2024-09-24 07:08:42,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=447370.0, ans=0.1 2024-09-24 07:09:19,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=447463.3333333333, ans=0.025 2024-09-24 07:09:21,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=447463.3333333333, ans=0.0 2024-09-24 07:09:31,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.26 vs. limit=22.5 2024-09-24 07:09:39,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-24 07:09:41,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=447510.0, ans=0.07 2024-09-24 07:09:47,648 INFO [train.py:1198] (3/4) Epoch 25, batch 2400, loss[loss=0.1856, ctc_loss=0.1195, cr_loss=0.3306, over 17290.00 frames. ], tot_loss[loss=0.2081, ctc_loss=0.1372, cr_loss=0.3543, over 3344812.59 frames. ], batch size: 42, lr: 4.77e-03, grad_scale: 32.0 2024-09-24 07:10:10,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=447603.3333333333, ans=0.025 2024-09-24 07:10:10,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=447603.3333333333, ans=0.07 2024-09-24 07:10:14,558 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.299e+02 1.395e+02 1.509e+02 2.801e+02, threshold=2.791e+02, percent-clipped=1.0 2024-09-24 07:10:16,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=447603.3333333333, ans=0.125 2024-09-24 07:10:29,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=447650.0, ans=0.0 2024-09-24 07:10:53,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=447696.6666666667, ans=0.125 2024-09-24 07:10:55,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=447743.3333333333, ans=0.2 2024-09-24 07:11:04,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=447743.3333333333, ans=0.125 2024-09-24 07:11:06,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=447743.3333333333, ans=0.125 2024-09-24 07:11:07,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=447743.3333333333, ans=0.0 2024-09-24 07:11:10,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=22.5 2024-09-24 07:11:12,578 INFO [train.py:1198] (3/4) Epoch 25, batch 2450, loss[loss=0.2028, ctc_loss=0.1327, cr_loss=0.3502, over 16881.00 frames. ], tot_loss[loss=0.2085, ctc_loss=0.1375, cr_loss=0.355, over 3349132.30 frames. ], batch size: 58, lr: 4.77e-03, grad_scale: 32.0 2024-09-24 07:11:14,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=447790.0, ans=0.0 2024-09-24 07:11:59,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=447930.0, ans=0.025 2024-09-24 07:12:13,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=447930.0, ans=0.1 2024-09-24 07:12:24,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=447976.6666666667, ans=0.125 2024-09-24 07:12:25,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=447976.6666666667, ans=15.0 2024-09-24 07:12:37,503 INFO [train.py:1198] (3/4) Epoch 25, batch 2500, loss[loss=0.2104, ctc_loss=0.138, cr_loss=0.3621, over 16988.00 frames. ], tot_loss[loss=0.2085, ctc_loss=0.1376, cr_loss=0.3546, over 3340416.83 frames. ], batch size: 53, lr: 4.77e-03, grad_scale: 32.0 2024-09-24 07:12:43,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2024-09-24 07:12:57,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=448070.0, ans=0.125 2024-09-24 07:12:57,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=448070.0, ans=0.025 2024-09-24 07:13:00,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=448070.0, ans=0.04949747468305833 2024-09-24 07:13:04,744 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.243e+02 1.319e+02 1.410e+02 2.410e+02, threshold=2.638e+02, percent-clipped=0.0 2024-09-24 07:13:10,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.79 vs. limit=15.0 2024-09-24 07:13:43,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=448210.0, ans=0.125 2024-09-24 07:13:59,334 INFO [train.py:1198] (3/4) Epoch 25, batch 2550, loss[loss=0.1824, ctc_loss=0.1161, cr_loss=0.3318, over 17046.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.1363, cr_loss=0.353, over 3346142.65 frames. ], batch size: 39, lr: 4.77e-03, grad_scale: 32.0 2024-09-24 07:14:31,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=448350.0, ans=0.125 2024-09-24 07:15:11,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=448443.3333333333, ans=0.0 2024-09-24 07:15:11,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=448443.3333333333, ans=0.125 2024-09-24 07:15:19,296 INFO [train.py:1198] (3/4) Epoch 25, batch 2600, loss[loss=0.2353, ctc_loss=0.157, cr_loss=0.3918, over 16121.00 frames. ], tot_loss[loss=0.207, ctc_loss=0.1365, cr_loss=0.3524, over 3345305.04 frames. ], batch size: 74, lr: 4.77e-03, grad_scale: 32.0 2024-09-24 07:15:30,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=448490.0, ans=0.2 2024-09-24 07:15:51,575 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.258e+02 1.378e+02 1.492e+02 2.205e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-24 07:16:44,544 INFO [train.py:1198] (3/4) Epoch 25, batch 2650, loss[loss=0.1992, ctc_loss=0.132, cr_loss=0.3362, over 17097.00 frames. ], tot_loss[loss=0.2077, ctc_loss=0.1371, cr_loss=0.3529, over 3344649.96 frames. ], batch size: 49, lr: 4.77e-03, grad_scale: 32.0 2024-09-24 07:16:45,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2024-09-24 07:17:04,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-09-24 07:17:17,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=448816.6666666667, ans=0.95 2024-09-24 07:17:17,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=448816.6666666667, ans=0.0 2024-09-24 07:17:24,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=448816.6666666667, ans=0.125 2024-09-24 07:17:49,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=448910.0, ans=0.09899494936611666 2024-09-24 07:18:10,105 INFO [train.py:1198] (3/4) Epoch 25, batch 2700, loss[loss=0.2263, ctc_loss=0.1492, cr_loss=0.3853, over 16999.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.1366, cr_loss=0.3527, over 3353751.02 frames. ], batch size: 52, lr: 4.77e-03, grad_scale: 32.0 2024-09-24 07:18:26,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=449003.3333333333, ans=0.025 2024-09-24 07:18:29,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=449003.3333333333, ans=0.125 2024-09-24 07:18:32,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=449003.3333333333, ans=0.125 2024-09-24 07:18:37,164 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.249e+02 1.338e+02 1.409e+02 1.725e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-24 07:18:45,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=449050.0, ans=0.1 2024-09-24 07:18:58,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=449096.6666666667, ans=0.125 2024-09-24 07:19:29,779 INFO [train.py:1198] (3/4) Epoch 25, batch 2750, loss[loss=0.1879, ctc_loss=0.1275, cr_loss=0.3018, over 17070.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.1365, cr_loss=0.352, over 3337930.28 frames. ], batch size: 46, lr: 4.76e-03, grad_scale: 16.0 2024-09-24 07:19:36,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=449190.0, ans=0.0 2024-09-24 07:20:45,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=449376.6666666667, ans=0.0 2024-09-24 07:20:48,288 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.97 vs. limit=22.5 2024-09-24 07:20:55,050 INFO [train.py:1198] (3/4) Epoch 25, batch 2800, loss[loss=0.229, ctc_loss=0.1546, cr_loss=0.372, over 16568.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.1366, cr_loss=0.3524, over 3339613.75 frames. ], batch size: 66, lr: 4.76e-03, grad_scale: 32.0 2024-09-24 07:20:55,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449423.3333333333, ans=0.1 2024-09-24 07:21:09,862 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 07:21:23,916 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.252e+02 1.365e+02 1.477e+02 3.816e+02, threshold=2.730e+02, percent-clipped=1.0 2024-09-24 07:21:37,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=449516.6666666667, ans=0.0 2024-09-24 07:21:56,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=449563.3333333333, ans=0.0 2024-09-24 07:22:07,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449610.0, ans=0.1 2024-09-24 07:22:14,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=449656.6666666667, ans=15.0 2024-09-24 07:22:17,828 INFO [train.py:1198] (3/4) Epoch 25, batch 2850, loss[loss=0.1825, ctc_loss=0.1192, cr_loss=0.3166, over 16290.00 frames. ], tot_loss[loss=0.2068, ctc_loss=0.1364, cr_loss=0.3521, over 3339176.18 frames. ], batch size: 36, lr: 4.76e-03, grad_scale: 32.0 2024-09-24 07:22:26,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=449656.6666666667, ans=0.025 2024-09-24 07:22:37,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449703.3333333333, ans=0.1 2024-09-24 07:23:10,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=449796.6666666667, ans=0.0 2024-09-24 07:23:24,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=449843.3333333333, ans=0.125 2024-09-24 07:23:24,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=449843.3333333333, ans=0.125 2024-09-24 07:23:40,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=15.0 2024-09-24 07:23:40,782 INFO [train.py:1198] (3/4) Epoch 25, batch 2900, loss[loss=0.2145, ctc_loss=0.1423, cr_loss=0.361, over 17279.00 frames. ], tot_loss[loss=0.208, ctc_loss=0.1372, cr_loss=0.3544, over 3342445.01 frames. ], batch size: 46, lr: 4.76e-03, grad_scale: 32.0 2024-09-24 07:23:45,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=449890.0, ans=0.125 2024-09-24 07:23:47,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=449890.0, ans=0.0 2024-09-24 07:24:01,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-24 07:24:09,619 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.270e+02 1.360e+02 1.492e+02 4.410e+02, threshold=2.720e+02, percent-clipped=1.0 2024-09-24 07:24:10,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=449936.6666666667, ans=0.025 2024-09-24 07:24:19,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=449983.3333333333, ans=10.0 2024-09-24 07:24:46,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=450076.6666666667, ans=0.1 2024-09-24 07:24:49,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=450076.6666666667, ans=0.0 2024-09-24 07:25:00,615 INFO [train.py:1198] (3/4) Epoch 25, batch 2950, loss[loss=0.2329, ctc_loss=0.1532, cr_loss=0.3985, over 17063.00 frames. ], tot_loss[loss=0.2077, ctc_loss=0.137, cr_loss=0.3536, over 3338397.05 frames. ], batch size: 52, lr: 4.76e-03, grad_scale: 32.0 2024-09-24 07:25:15,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=22.5 2024-09-24 07:25:40,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.46 vs. limit=22.5 2024-09-24 07:25:42,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=450216.6666666667, ans=0.0 2024-09-24 07:26:08,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=450310.0, ans=0.125 2024-09-24 07:26:16,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=450310.0, ans=0.0 2024-09-24 07:26:25,454 INFO [train.py:1198] (3/4) Epoch 25, batch 3000, loss[loss=0.1697, ctc_loss=0.1136, cr_loss=0.2803, over 16333.00 frames. ], tot_loss[loss=0.2081, ctc_loss=0.1373, cr_loss=0.3539, over 3336977.66 frames. ], batch size: 36, lr: 4.76e-03, grad_scale: 32.0 2024-09-24 07:26:25,454 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 07:26:41,268 INFO [train.py:1230] (3/4) Epoch 25, validation: loss=0.03749, ctc_loss=0.03749, cr_loss=8.201e-15, over 944034.00 frames. 2024-09-24 07:26:41,268 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 07:27:09,645 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.216e+02 1.332e+02 1.454e+02 2.331e+02, threshold=2.665e+02, percent-clipped=0.0 2024-09-24 07:27:26,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=450496.6666666667, ans=0.1 2024-09-24 07:27:31,804 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 07:27:45,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=450543.3333333333, ans=0.0 2024-09-24 07:27:58,906 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.46 vs. limit=15.0 2024-09-24 07:28:02,384 INFO [train.py:1198] (3/4) Epoch 25, batch 3050, loss[loss=0.23, ctc_loss=0.1523, cr_loss=0.3887, over 17015.00 frames. ], tot_loss[loss=0.207, ctc_loss=0.1364, cr_loss=0.3531, over 3343920.91 frames. ], batch size: 56, lr: 4.76e-03, grad_scale: 32.0 2024-09-24 07:28:22,142 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-09-24 07:28:47,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=450730.0, ans=0.125 2024-09-24 07:29:21,017 INFO [train.py:1198] (3/4) Epoch 25, batch 3100, loss[loss=0.243, ctc_loss=0.1686, cr_loss=0.3721, over 15156.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.1364, cr_loss=0.3535, over 3344292.42 frames. ], batch size: 89, lr: 4.76e-03, grad_scale: 32.0 2024-09-24 07:29:38,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2024-09-24 07:29:42,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450870.0, ans=0.1 2024-09-24 07:29:51,454 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.263e+02 1.349e+02 1.464e+02 5.566e+02, threshold=2.698e+02, percent-clipped=1.0 2024-09-24 07:29:57,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450916.6666666667, ans=0.1 2024-09-24 07:30:21,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=450963.3333333333, ans=0.125 2024-09-24 07:30:41,211 INFO [train.py:1198] (3/4) Epoch 25, batch 3150, loss[loss=0.1883, ctc_loss=0.12, cr_loss=0.3415, over 17294.00 frames. ], tot_loss[loss=0.2067, ctc_loss=0.136, cr_loss=0.3532, over 3351703.67 frames. ], batch size: 46, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:30:50,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=451056.6666666667, ans=0.1 2024-09-24 07:31:05,338 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2024-09-24 07:31:28,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=451196.6666666667, ans=0.05 2024-09-24 07:31:36,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2024-09-24 07:31:49,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=451243.3333333333, ans=0.125 2024-09-24 07:31:58,816 INFO [train.py:1198] (3/4) Epoch 25, batch 3200, loss[loss=0.2265, ctc_loss=0.1517, cr_loss=0.374, over 15263.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.1351, cr_loss=0.3513, over 3356914.93 frames. ], batch size: 89, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:32:00,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=451290.0, ans=0.07 2024-09-24 07:32:03,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=451290.0, ans=0.0 2024-09-24 07:32:08,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=451290.0, ans=0.125 2024-09-24 07:32:19,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=451336.6666666667, ans=0.125 2024-09-24 07:32:26,943 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.256e+02 1.371e+02 1.503e+02 1.985e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-24 07:33:12,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2024-09-24 07:33:16,640 INFO [train.py:1198] (3/4) Epoch 25, batch 3250, loss[loss=0.2431, ctc_loss=0.1645, cr_loss=0.3932, over 17050.00 frames. ], tot_loss[loss=0.2057, ctc_loss=0.1353, cr_loss=0.3516, over 3357430.83 frames. ], batch size: 52, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:33:23,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=451523.3333333333, ans=0.09899494936611666 2024-09-24 07:33:32,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451570.0, ans=0.1 2024-09-24 07:33:41,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=451570.0, ans=0.0 2024-09-24 07:33:51,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=451616.6666666667, ans=0.0 2024-09-24 07:34:17,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=451710.0, ans=0.2 2024-09-24 07:34:34,605 INFO [train.py:1198] (3/4) Epoch 25, batch 3300, loss[loss=0.2173, ctc_loss=0.1435, cr_loss=0.3693, over 16921.00 frames. ], tot_loss[loss=0.2067, ctc_loss=0.1362, cr_loss=0.3527, over 3350754.91 frames. ], batch size: 58, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:34:35,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=451756.6666666667, ans=0.125 2024-09-24 07:35:04,446 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.271e+02 1.364e+02 1.475e+02 2.205e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-24 07:35:31,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=451896.6666666667, ans=0.125 2024-09-24 07:35:36,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=451896.6666666667, ans=0.2 2024-09-24 07:35:42,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451943.3333333333, ans=0.1 2024-09-24 07:35:45,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=451943.3333333333, ans=0.1 2024-09-24 07:35:56,274 INFO [train.py:1198] (3/4) Epoch 25, batch 3350, loss[loss=0.212, ctc_loss=0.1403, cr_loss=0.3587, over 17364.00 frames. ], tot_loss[loss=0.207, ctc_loss=0.1364, cr_loss=0.3528, over 3336416.41 frames. ], batch size: 48, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:36:16,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=452036.6666666667, ans=0.025 2024-09-24 07:36:32,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=22.5 2024-09-24 07:36:35,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=452083.3333333333, ans=0.2 2024-09-24 07:36:56,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452130.0, ans=0.1 2024-09-24 07:37:11,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.82 vs. limit=15.0 2024-09-24 07:37:15,210 INFO [train.py:1198] (3/4) Epoch 25, batch 3400, loss[loss=0.1956, ctc_loss=0.1256, cr_loss=0.3502, over 17074.00 frames. ], tot_loss[loss=0.2075, ctc_loss=0.1368, cr_loss=0.3536, over 3343593.91 frames. ], batch size: 46, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:37:28,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=22.5 2024-09-24 07:37:43,412 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.052e+02 1.278e+02 1.359e+02 1.525e+02 3.338e+02, threshold=2.719e+02, percent-clipped=1.0 2024-09-24 07:37:51,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=452316.6666666667, ans=0.025 2024-09-24 07:37:59,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452316.6666666667, ans=0.1 2024-09-24 07:37:59,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=452316.6666666667, ans=0.125 2024-09-24 07:38:08,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=452363.3333333333, ans=0.125 2024-09-24 07:38:26,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=452410.0, ans=0.025 2024-09-24 07:38:35,006 INFO [train.py:1198] (3/4) Epoch 25, batch 3450, loss[loss=0.2052, ctc_loss=0.1368, cr_loss=0.342, over 17295.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.1364, cr_loss=0.3535, over 3346227.99 frames. ], batch size: 49, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:38:50,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=452503.3333333333, ans=0.2 2024-09-24 07:39:12,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=452550.0, ans=0.2 2024-09-24 07:39:55,460 INFO [train.py:1198] (3/4) Epoch 25, batch 3500, loss[loss=0.2536, ctc_loss=0.1801, cr_loss=0.3674, over 11452.00 frames. ], tot_loss[loss=0.207, ctc_loss=0.1362, cr_loss=0.3539, over 3348007.79 frames. ], batch size: 125, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:40:10,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=452736.6666666667, ans=0.125 2024-09-24 07:40:19,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452736.6666666667, ans=0.1 2024-09-24 07:40:23,621 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.253e+02 1.349e+02 1.455e+02 2.168e+02, threshold=2.697e+02, percent-clipped=1.0 2024-09-24 07:40:55,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2024-09-24 07:41:13,559 INFO [train.py:1198] (3/4) Epoch 25, batch 3550, loss[loss=0.1884, ctc_loss=0.1241, cr_loss=0.3216, over 17298.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.1358, cr_loss=0.3532, over 3351056.03 frames. ], batch size: 51, lr: 4.74e-03, grad_scale: 16.0 2024-09-24 07:41:26,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=452923.3333333333, ans=0.0 2024-09-24 07:41:55,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=22.5 2024-09-24 07:41:58,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=453063.3333333333, ans=0.125 2024-09-24 07:42:18,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453110.0, ans=0.1 2024-09-24 07:42:26,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=453110.0, ans=0.125 2024-09-24 07:42:31,019 INFO [train.py:1198] (3/4) Epoch 25, batch 3600, loss[loss=0.1844, ctc_loss=0.1178, cr_loss=0.3332, over 17227.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.1359, cr_loss=0.3528, over 3345937.78 frames. ], batch size: 50, lr: 4.74e-03, grad_scale: 32.0 2024-09-24 07:42:53,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=453203.3333333333, ans=0.2 2024-09-24 07:42:56,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=453203.3333333333, ans=0.05 2024-09-24 07:43:00,642 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.246e+02 1.327e+02 1.467e+02 2.954e+02, threshold=2.655e+02, percent-clipped=1.0 2024-09-24 07:43:23,361 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=22.5 2024-09-24 07:43:25,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=453296.6666666667, ans=0.2 2024-09-24 07:43:31,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=453343.3333333333, ans=0.5 2024-09-24 07:43:48,680 INFO [train.py:1198] (3/4) Epoch 25, batch 3650, loss[loss=0.1796, ctc_loss=0.1155, cr_loss=0.3205, over 17265.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1356, cr_loss=0.3531, over 3357326.09 frames. ], batch size: 42, lr: 4.74e-03, grad_scale: 32.0 2024-09-24 07:43:50,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=453390.0, ans=0.0 2024-09-24 07:43:52,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=453390.0, ans=0.125 2024-09-24 07:44:07,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=453436.6666666667, ans=0.125 2024-09-24 07:44:23,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=453483.3333333333, ans=0.025 2024-09-24 07:44:34,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=453483.3333333333, ans=0.125 2024-09-24 07:44:35,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=453483.3333333333, ans=0.0 2024-09-24 07:44:41,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=453530.0, ans=0.04949747468305833 2024-09-24 07:44:59,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=453576.6666666667, ans=0.2 2024-09-24 07:45:01,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=453576.6666666667, ans=0.125 2024-09-24 07:45:11,883 INFO [train.py:1198] (3/4) Epoch 25, batch 3700, loss[loss=0.1947, ctc_loss=0.1281, cr_loss=0.3331, over 17002.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1352, cr_loss=0.3527, over 3360804.48 frames. ], batch size: 44, lr: 4.74e-03, grad_scale: 32.0 2024-09-24 07:45:24,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=453623.3333333333, ans=0.125 2024-09-24 07:45:41,671 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.237e+02 1.315e+02 1.374e+02 1.764e+02, threshold=2.629e+02, percent-clipped=0.0 2024-09-24 07:45:53,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453716.6666666667, ans=0.1 2024-09-24 07:45:53,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.73 vs. limit=15.0 2024-09-24 07:46:30,216 INFO [train.py:1198] (3/4) Epoch 25, batch 3750, loss[loss=0.1836, ctc_loss=0.1178, cr_loss=0.329, over 16967.00 frames. ], tot_loss[loss=0.206, ctc_loss=0.1354, cr_loss=0.3532, over 3363545.23 frames. ], batch size: 42, lr: 4.74e-03, grad_scale: 32.0 2024-09-24 07:47:01,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=453950.0, ans=0.0 2024-09-24 07:47:12,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=453950.0, ans=0.125 2024-09-24 07:47:29,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=453996.6666666667, ans=0.125 2024-09-24 07:47:30,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-09-24 07:47:39,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=15.0 2024-09-24 07:47:49,009 INFO [train.py:1198] (3/4) Epoch 25, batch 3800, loss[loss=0.2019, ctc_loss=0.1297, cr_loss=0.3612, over 17034.00 frames. ], tot_loss[loss=0.2074, ctc_loss=0.1366, cr_loss=0.3544, over 3336670.73 frames. ], batch size: 44, lr: 4.74e-03, grad_scale: 32.0 2024-09-24 07:48:00,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=454090.0, ans=0.125 2024-09-24 07:48:03,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=454136.6666666667, ans=0.2 2024-09-24 07:48:05,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=454136.6666666667, ans=0.125 2024-09-24 07:48:18,771 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.251e+02 1.351e+02 1.487e+02 2.041e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-24 07:48:24,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2024-09-24 07:48:31,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=454183.3333333333, ans=0.0 2024-09-24 07:48:58,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=454276.6666666667, ans=0.125 2024-09-24 07:49:00,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=454276.6666666667, ans=0.1 2024-09-24 07:49:07,751 INFO [train.py:1198] (3/4) Epoch 25, batch 3850, loss[loss=0.2664, ctc_loss=0.1872, cr_loss=0.3962, over 15122.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1383, cr_loss=0.3559, over 3296503.59 frames. ], batch size: 89, lr: 4.74e-03, grad_scale: 16.0 2024-09-24 07:49:51,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2024-09-24 07:49:53,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=454463.3333333333, ans=0.0 2024-09-24 07:50:10,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=454510.0, ans=0.125 2024-09-24 07:50:15,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2024-09-24 07:51:09,081 INFO [train.py:1198] (3/4) Epoch 26, batch 0, loss[loss=0.1691, ctc_loss=0.1112, cr_loss=0.2894, over 17182.00 frames. ], tot_loss[loss=0.1691, ctc_loss=0.1112, cr_loss=0.2894, over 17182.00 frames. ], batch size: 41, lr: 4.64e-03, grad_scale: 32.0 2024-09-24 07:51:09,082 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 07:51:25,119 INFO [train.py:1230] (3/4) Epoch 26, validation: loss=0.03743, ctc_loss=0.03743, cr_loss=8.662e-15, over 944034.00 frames. 2024-09-24 07:51:25,120 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 07:51:29,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=12.0 2024-09-24 07:51:29,361 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=22.5 2024-09-24 07:51:37,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.20 vs. limit=6.0 2024-09-24 07:51:39,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=454584.6666666667, ans=0.05 2024-09-24 07:51:42,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=454584.6666666667, ans=0.0 2024-09-24 07:52:01,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=454631.3333333333, ans=0.125 2024-09-24 07:52:06,031 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.313e+02 1.484e+02 1.654e+02 2.315e+02, threshold=2.969e+02, percent-clipped=0.0 2024-09-24 07:52:37,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.73 vs. limit=5.0 2024-09-24 07:52:47,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=454724.6666666667, ans=0.025 2024-09-24 07:52:49,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=454771.3333333333, ans=0.2 2024-09-24 07:52:50,423 INFO [train.py:1198] (3/4) Epoch 26, batch 50, loss[loss=0.1965, ctc_loss=0.1308, cr_loss=0.3286, over 16042.00 frames. ], tot_loss[loss=0.2018, ctc_loss=0.1323, cr_loss=0.3472, over 764564.53 frames. ], batch size: 74, lr: 4.64e-03, grad_scale: 32.0 2024-09-24 07:53:03,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.82 vs. limit=10.0 2024-09-24 07:53:35,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=454864.6666666667, ans=0.2 2024-09-24 07:53:43,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=454911.3333333333, ans=0.025 2024-09-24 07:53:53,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.73 vs. limit=22.5 2024-09-24 07:53:56,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.17 vs. limit=22.5 2024-09-24 07:54:05,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=454958.0, ans=0.95 2024-09-24 07:54:08,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=454958.0, ans=0.125 2024-09-24 07:54:11,279 INFO [train.py:1198] (3/4) Epoch 26, batch 100, loss[loss=0.1954, ctc_loss=0.1276, cr_loss=0.339, over 17076.00 frames. ], tot_loss[loss=0.2022, ctc_loss=0.1329, cr_loss=0.3464, over 1336298.53 frames. ], batch size: 43, lr: 4.64e-03, grad_scale: 32.0 2024-09-24 07:54:15,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2024-09-24 07:54:16,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2024-09-24 07:54:51,777 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.228e+02 1.285e+02 1.398e+02 1.660e+02, threshold=2.570e+02, percent-clipped=0.0 2024-09-24 07:55:00,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=455144.6666666667, ans=0.1 2024-09-24 07:55:04,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=455144.6666666667, ans=0.025 2024-09-24 07:55:20,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455191.3333333333, ans=0.1 2024-09-24 07:55:24,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=22.5 2024-09-24 07:55:33,075 INFO [train.py:1198] (3/4) Epoch 26, batch 150, loss[loss=0.1928, ctc_loss=0.1214, cr_loss=0.357, over 17271.00 frames. ], tot_loss[loss=0.2041, ctc_loss=0.134, cr_loss=0.3504, over 1781174.53 frames. ], batch size: 44, lr: 4.64e-03, grad_scale: 32.0 2024-09-24 07:55:33,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=455238.0, ans=0.125 2024-09-24 07:55:54,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.02 vs. limit=10.0 2024-09-24 07:56:00,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=455284.6666666667, ans=0.125 2024-09-24 07:56:12,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-09-24 07:56:18,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=455331.3333333333, ans=0.0 2024-09-24 07:56:33,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.16 vs. limit=15.0 2024-09-24 07:56:35,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=455424.6666666667, ans=0.125 2024-09-24 07:56:37,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=455424.6666666667, ans=0.0 2024-09-24 07:56:45,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=455424.6666666667, ans=10.0 2024-09-24 07:56:55,950 INFO [train.py:1198] (3/4) Epoch 26, batch 200, loss[loss=0.1714, ctc_loss=0.11, cr_loss=0.3072, over 17103.00 frames. ], tot_loss[loss=0.2043, ctc_loss=0.1341, cr_loss=0.3507, over 2122811.11 frames. ], batch size: 43, lr: 4.64e-03, grad_scale: 32.0 2024-09-24 07:57:05,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=455471.3333333333, ans=0.04949747468305833 2024-09-24 07:57:37,056 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.254e+02 1.327e+02 1.395e+02 2.472e+02, threshold=2.655e+02, percent-clipped=0.0 2024-09-24 07:57:40,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455564.6666666667, ans=0.1 2024-09-24 07:57:50,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=455611.3333333333, ans=0.125 2024-09-24 07:57:50,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=455611.3333333333, ans=0.09899494936611666 2024-09-24 07:58:15,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=455658.0, ans=0.125 2024-09-24 07:58:20,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2024-09-24 07:58:21,721 INFO [train.py:1198] (3/4) Epoch 26, batch 250, loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.3717, over 17017.00 frames. ], tot_loss[loss=0.205, ctc_loss=0.1347, cr_loss=0.3516, over 2401523.15 frames. ], batch size: 44, lr: 4.64e-03, grad_scale: 32.0 2024-09-24 07:58:25,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=455704.6666666667, ans=0.125 2024-09-24 07:59:10,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=455844.6666666667, ans=0.125 2024-09-24 07:59:13,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=455844.6666666667, ans=0.0 2024-09-24 07:59:22,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=455844.6666666667, ans=0.0 2024-09-24 07:59:25,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=455844.6666666667, ans=0.125 2024-09-24 07:59:44,777 INFO [train.py:1198] (3/4) Epoch 26, batch 300, loss[loss=0.1908, ctc_loss=0.1224, cr_loss=0.3418, over 17227.00 frames. ], tot_loss[loss=0.205, ctc_loss=0.1347, cr_loss=0.3515, over 2616251.67 frames. ], batch size: 47, lr: 4.64e-03, grad_scale: 32.0 2024-09-24 07:59:50,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=455938.0, ans=0.2 2024-09-24 07:59:51,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2024-09-24 07:59:56,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=455938.0, ans=0.125 2024-09-24 07:59:59,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=455984.6666666667, ans=0.0 2024-09-24 08:00:22,960 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.258e+02 1.353e+02 1.431e+02 1.919e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-24 08:00:26,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=456031.3333333333, ans=0.125 2024-09-24 08:00:57,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=456124.6666666667, ans=0.125 2024-09-24 08:01:04,592 INFO [train.py:1198] (3/4) Epoch 26, batch 350, loss[loss=0.2258, ctc_loss=0.1484, cr_loss=0.3871, over 17027.00 frames. ], tot_loss[loss=0.2056, ctc_loss=0.1351, cr_loss=0.3523, over 2783631.60 frames. ], batch size: 56, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:01:09,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=456171.3333333333, ans=0.2 2024-09-24 08:01:12,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=456171.3333333333, ans=0.1 2024-09-24 08:01:21,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=22.5 2024-09-24 08:01:27,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=456218.0, ans=0.125 2024-09-24 08:01:38,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=456264.6666666667, ans=0.125 2024-09-24 08:01:57,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=456311.3333333333, ans=0.0 2024-09-24 08:02:30,050 INFO [train.py:1198] (3/4) Epoch 26, batch 400, loss[loss=0.1793, ctc_loss=0.1161, cr_loss=0.3157, over 16945.00 frames. ], tot_loss[loss=0.2056, ctc_loss=0.1352, cr_loss=0.3522, over 2913150.17 frames. ], batch size: 42, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:02:57,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=456451.3333333333, ans=0.1 2024-09-24 08:02:59,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.90 vs. limit=22.5 2024-09-24 08:03:08,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2024-09-24 08:03:12,625 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.258e+02 1.321e+02 1.410e+02 2.001e+02, threshold=2.643e+02, percent-clipped=0.0 2024-09-24 08:03:16,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=456498.0, ans=0.025 2024-09-24 08:03:19,718 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:03:52,935 INFO [train.py:1198] (3/4) Epoch 26, batch 450, loss[loss=0.222, ctc_loss=0.1455, cr_loss=0.3823, over 17212.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1353, cr_loss=0.3523, over 3018118.29 frames. ], batch size: 47, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:04:19,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=15.0 2024-09-24 08:05:08,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2024-09-24 08:05:15,526 INFO [train.py:1198] (3/4) Epoch 26, batch 500, loss[loss=0.2555, ctc_loss=0.1808, cr_loss=0.3736, over 11644.00 frames. ], tot_loss[loss=0.206, ctc_loss=0.1355, cr_loss=0.3524, over 3099014.32 frames. ], batch size: 125, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:05:19,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2024-09-24 08:05:39,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=456918.0, ans=0.0 2024-09-24 08:05:46,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=456964.6666666667, ans=0.0 2024-09-24 08:05:55,349 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.259e+02 1.357e+02 1.518e+02 2.705e+02, threshold=2.714e+02, percent-clipped=1.0 2024-09-24 08:06:05,706 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2024-09-24 08:06:16,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=457011.3333333333, ans=0.2 2024-09-24 08:06:30,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=457058.0, ans=0.0 2024-09-24 08:06:37,536 INFO [train.py:1198] (3/4) Epoch 26, batch 550, loss[loss=0.2525, ctc_loss=0.1698, cr_loss=0.4136, over 14875.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.1358, cr_loss=0.353, over 3161970.91 frames. ], batch size: 89, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:06:57,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=22.5 2024-09-24 08:07:23,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=457198.0, ans=0.125 2024-09-24 08:07:25,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=457198.0, ans=0.0 2024-09-24 08:07:38,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.12 vs. limit=10.0 2024-09-24 08:08:00,436 INFO [train.py:1198] (3/4) Epoch 26, batch 600, loss[loss=0.1699, ctc_loss=0.1094, cr_loss=0.3025, over 16998.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1349, cr_loss=0.3511, over 3205296.79 frames. ], batch size: 39, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:08:11,458 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:08:20,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=457384.6666666667, ans=0.2 2024-09-24 08:08:31,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-09-24 08:08:33,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=457431.3333333333, ans=0.125 2024-09-24 08:08:36,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=457431.3333333333, ans=0.1 2024-09-24 08:08:42,969 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.231e+02 1.314e+02 1.456e+02 1.940e+02, threshold=2.629e+02, percent-clipped=0.0 2024-09-24 08:08:49,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=457478.0, ans=0.125 2024-09-24 08:09:01,746 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2024-09-24 08:09:04,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=457478.0, ans=0.125 2024-09-24 08:09:21,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=457524.6666666667, ans=0.0 2024-09-24 08:09:26,037 INFO [train.py:1198] (3/4) Epoch 26, batch 650, loss[loss=0.2229, ctc_loss=0.1465, cr_loss=0.382, over 16988.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.1351, cr_loss=0.3517, over 3247455.95 frames. ], batch size: 56, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:09:47,842 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.88 vs. limit=22.5 2024-09-24 08:09:54,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2024-09-24 08:10:25,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=457711.3333333333, ans=0.0 2024-09-24 08:10:32,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=457758.0, ans=0.125 2024-09-24 08:10:37,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=457758.0, ans=0.125 2024-09-24 08:10:46,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.13 vs. limit=12.0 2024-09-24 08:10:46,679 INFO [train.py:1198] (3/4) Epoch 26, batch 700, loss[loss=0.1945, ctc_loss=0.1273, cr_loss=0.3359, over 17020.00 frames. ], tot_loss[loss=0.2063, ctc_loss=0.1357, cr_loss=0.3532, over 3279959.92 frames. ], batch size: 44, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:11:06,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=457851.3333333333, ans=0.07 2024-09-24 08:11:23,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=457898.0, ans=0.125 2024-09-24 08:11:28,404 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.269e+02 1.380e+02 1.511e+02 2.346e+02, threshold=2.761e+02, percent-clipped=0.0 2024-09-24 08:11:30,431 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:11:50,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=457944.6666666667, ans=0.025 2024-09-24 08:12:11,813 INFO [train.py:1198] (3/4) Epoch 26, batch 750, loss[loss=0.1723, ctc_loss=0.1138, cr_loss=0.2927, over 17314.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1353, cr_loss=0.3526, over 3296339.47 frames. ], batch size: 49, lr: 4.63e-03, grad_scale: 16.0 2024-09-24 08:12:36,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.51 vs. limit=22.5 2024-09-24 08:12:42,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=458131.3333333333, ans=0.0 2024-09-24 08:12:42,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=458131.3333333333, ans=0.0 2024-09-24 08:12:57,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=458131.3333333333, ans=0.0 2024-09-24 08:13:09,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=458178.0, ans=0.125 2024-09-24 08:13:10,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=458178.0, ans=0.0 2024-09-24 08:13:15,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.05 vs. limit=10.0 2024-09-24 08:13:23,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=458224.6666666667, ans=0.125 2024-09-24 08:13:34,685 INFO [train.py:1198] (3/4) Epoch 26, batch 800, loss[loss=0.2123, ctc_loss=0.1457, cr_loss=0.333, over 16079.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1348, cr_loss=0.3516, over 3299779.27 frames. ], batch size: 74, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:14:18,901 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.021e+02 1.286e+02 1.386e+02 1.460e+02 2.197e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-24 08:14:20,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=458364.6666666667, ans=0.0 2024-09-24 08:14:27,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=458411.3333333333, ans=0.125 2024-09-24 08:14:49,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=458458.0, ans=0.07 2024-09-24 08:14:55,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=458504.6666666667, ans=0.125 2024-09-24 08:14:57,081 INFO [train.py:1198] (3/4) Epoch 26, batch 850, loss[loss=0.2147, ctc_loss=0.1432, cr_loss=0.3573, over 17282.00 frames. ], tot_loss[loss=0.2041, ctc_loss=0.134, cr_loss=0.3503, over 3320711.28 frames. ], batch size: 49, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:15:00,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=458504.6666666667, ans=0.2 2024-09-24 08:15:08,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=458504.6666666667, ans=0.0 2024-09-24 08:15:34,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=458598.0, ans=0.125 2024-09-24 08:15:37,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=458598.0, ans=10.0 2024-09-24 08:15:37,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=458598.0, ans=0.0 2024-09-24 08:15:59,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=458691.3333333333, ans=0.125 2024-09-24 08:16:17,230 INFO [train.py:1198] (3/4) Epoch 26, batch 900, loss[loss=0.1901, ctc_loss=0.1229, cr_loss=0.3361, over 17088.00 frames. ], tot_loss[loss=0.2039, ctc_loss=0.1339, cr_loss=0.3498, over 3333276.16 frames. ], batch size: 40, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:16:27,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=458738.0, ans=0.125 2024-09-24 08:17:01,227 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.265e+02 1.347e+02 1.465e+02 1.812e+02, threshold=2.693e+02, percent-clipped=0.0 2024-09-24 08:17:29,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=458924.6666666667, ans=0.125 2024-09-24 08:17:32,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=458924.6666666667, ans=0.125 2024-09-24 08:17:38,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=458924.6666666667, ans=0.2 2024-09-24 08:17:38,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=458924.6666666667, ans=0.125 2024-09-24 08:17:41,544 INFO [train.py:1198] (3/4) Epoch 26, batch 950, loss[loss=0.1888, ctc_loss=0.1226, cr_loss=0.3313, over 17095.00 frames. ], tot_loss[loss=0.2034, ctc_loss=0.1336, cr_loss=0.3489, over 3335339.71 frames. ], batch size: 40, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:17:48,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=458971.3333333333, ans=0.0 2024-09-24 08:17:52,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=22.5 2024-09-24 08:18:09,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.99 vs. limit=15.0 2024-09-24 08:18:25,037 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2024-09-24 08:18:29,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=459064.6666666667, ans=0.125 2024-09-24 08:18:38,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2024-09-24 08:18:45,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=459111.3333333333, ans=0.125 2024-09-24 08:18:58,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=459158.0, ans=0.125 2024-09-24 08:19:04,842 INFO [train.py:1198] (3/4) Epoch 26, batch 1000, loss[loss=0.2294, ctc_loss=0.1492, cr_loss=0.4013, over 17020.00 frames. ], tot_loss[loss=0.2028, ctc_loss=0.1331, cr_loss=0.3483, over 3349104.37 frames. ], batch size: 52, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:19:34,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=459251.3333333333, ans=0.0 2024-09-24 08:19:48,776 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.281e+02 1.338e+02 1.458e+02 2.157e+02, threshold=2.677e+02, percent-clipped=0.0 2024-09-24 08:20:00,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2024-09-24 08:20:27,261 INFO [train.py:1198] (3/4) Epoch 26, batch 1050, loss[loss=0.2192, ctc_loss=0.1463, cr_loss=0.3647, over 17292.00 frames. ], tot_loss[loss=0.2033, ctc_loss=0.1336, cr_loss=0.3488, over 3350943.89 frames. ], batch size: 49, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:20:29,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=459438.0, ans=0.1 2024-09-24 08:20:31,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2024-09-24 08:20:45,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=459484.6666666667, ans=0.0 2024-09-24 08:20:51,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=459484.6666666667, ans=0.1 2024-09-24 08:21:04,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=459531.3333333333, ans=0.125 2024-09-24 08:21:04,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2024-09-24 08:21:22,586 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.56 vs. limit=15.0 2024-09-24 08:21:50,048 INFO [train.py:1198] (3/4) Epoch 26, batch 1100, loss[loss=0.2174, ctc_loss=0.1438, cr_loss=0.3682, over 16922.00 frames. ], tot_loss[loss=0.2023, ctc_loss=0.1327, cr_loss=0.3476, over 3352380.34 frames. ], batch size: 58, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:21:53,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=459671.3333333333, ans=0.125 2024-09-24 08:22:11,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2024-09-24 08:22:15,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2024-09-24 08:22:33,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=459764.6666666667, ans=0.1 2024-09-24 08:22:34,488 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.255e+02 1.355e+02 1.464e+02 2.621e+02, threshold=2.711e+02, percent-clipped=0.0 2024-09-24 08:22:37,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.67 vs. limit=5.0 2024-09-24 08:22:44,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=459811.3333333333, ans=0.125 2024-09-24 08:22:46,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=22.5 2024-09-24 08:23:15,201 INFO [train.py:1198] (3/4) Epoch 26, batch 1150, loss[loss=0.2365, ctc_loss=0.162, cr_loss=0.3728, over 16543.00 frames. ], tot_loss[loss=0.2041, ctc_loss=0.1342, cr_loss=0.3495, over 3347025.42 frames. ], batch size: 66, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:23:17,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=459904.6666666667, ans=0.04949747468305833 2024-09-24 08:23:21,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=459904.6666666667, ans=0.0 2024-09-24 08:23:47,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=459998.0, ans=0.1 2024-09-24 08:24:20,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=460091.3333333333, ans=0.2 2024-09-24 08:24:25,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=460091.3333333333, ans=0.2 2024-09-24 08:24:26,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=460091.3333333333, ans=0.2 2024-09-24 08:24:33,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=460091.3333333333, ans=0.125 2024-09-24 08:24:37,842 INFO [train.py:1198] (3/4) Epoch 26, batch 1200, loss[loss=0.1976, ctc_loss=0.1289, cr_loss=0.3436, over 17203.00 frames. ], tot_loss[loss=0.2035, ctc_loss=0.1336, cr_loss=0.3492, over 3356484.77 frames. ], batch size: 41, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:24:38,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=460138.0, ans=0.1 2024-09-24 08:25:19,716 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.237e+02 1.345e+02 1.469e+02 1.862e+02, threshold=2.691e+02, percent-clipped=0.0 2024-09-24 08:25:42,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=460324.6666666667, ans=0.09899494936611666 2024-09-24 08:25:56,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=460371.3333333333, ans=0.07 2024-09-24 08:25:58,193 INFO [train.py:1198] (3/4) Epoch 26, batch 1250, loss[loss=0.1837, ctc_loss=0.1215, cr_loss=0.3107, over 17069.00 frames. ], tot_loss[loss=0.2042, ctc_loss=0.1341, cr_loss=0.3504, over 3358774.90 frames. ], batch size: 46, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:26:11,937 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2024-09-24 08:26:16,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=460418.0, ans=0.125 2024-09-24 08:26:33,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.21 vs. limit=15.0 2024-09-24 08:27:23,294 INFO [train.py:1198] (3/4) Epoch 26, batch 1300, loss[loss=0.1893, ctc_loss=0.1208, cr_loss=0.3421, over 17293.00 frames. ], tot_loss[loss=0.2042, ctc_loss=0.1341, cr_loss=0.3506, over 3360609.23 frames. ], batch size: 51, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:27:52,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.44 vs. limit=15.0 2024-09-24 08:28:07,145 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.260e+02 1.346e+02 1.486e+02 2.192e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-24 08:28:20,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=460744.6666666667, ans=0.1 2024-09-24 08:28:37,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=15.0 2024-09-24 08:28:45,546 INFO [train.py:1198] (3/4) Epoch 26, batch 1350, loss[loss=0.1914, ctc_loss=0.1228, cr_loss=0.343, over 17303.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.135, cr_loss=0.3521, over 3359406.54 frames. ], batch size: 51, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:28:58,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2024-09-24 08:29:00,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=460884.6666666667, ans=0.05 2024-09-24 08:29:03,472 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:29:18,168 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.08 vs. limit=15.0 2024-09-24 08:30:08,672 INFO [train.py:1198] (3/4) Epoch 26, batch 1400, loss[loss=0.1734, ctc_loss=0.1119, cr_loss=0.3073, over 17169.00 frames. ], tot_loss[loss=0.206, ctc_loss=0.1354, cr_loss=0.3526, over 3361384.95 frames. ], batch size: 41, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:30:10,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=461071.3333333333, ans=0.0 2024-09-24 08:30:38,943 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.93 vs. limit=12.0 2024-09-24 08:30:50,707 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.251e+02 1.327e+02 1.445e+02 2.357e+02, threshold=2.654e+02, percent-clipped=0.0 2024-09-24 08:30:57,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=461211.3333333333, ans=0.2 2024-09-24 08:30:59,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=461211.3333333333, ans=0.09899494936611666 2024-09-24 08:31:12,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=461258.0, ans=0.1 2024-09-24 08:31:31,442 INFO [train.py:1198] (3/4) Epoch 26, batch 1450, loss[loss=0.2046, ctc_loss=0.1339, cr_loss=0.3533, over 17215.00 frames. ], tot_loss[loss=0.2056, ctc_loss=0.1352, cr_loss=0.3519, over 3368080.43 frames. ], batch size: 50, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:32:10,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=461398.0, ans=0.2 2024-09-24 08:32:26,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=461444.6666666667, ans=0.0 2024-09-24 08:32:36,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=461491.3333333333, ans=0.0 2024-09-24 08:32:36,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=461491.3333333333, ans=0.09899494936611666 2024-09-24 08:32:42,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=461491.3333333333, ans=0.125 2024-09-24 08:32:55,983 INFO [train.py:1198] (3/4) Epoch 26, batch 1500, loss[loss=0.2337, ctc_loss=0.152, cr_loss=0.4087, over 17250.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1347, cr_loss=0.3519, over 3372825.16 frames. ], batch size: 44, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:33:03,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.60 vs. limit=5.0 2024-09-24 08:33:11,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2024-09-24 08:33:37,703 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.270e+02 1.352e+02 1.469e+02 1.916e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-24 08:33:52,550 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=2.502e-03 2024-09-24 08:33:57,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=461678.0, ans=0.1 2024-09-24 08:34:09,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=461724.6666666667, ans=0.025 2024-09-24 08:34:15,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2024-09-24 08:34:19,043 INFO [train.py:1198] (3/4) Epoch 26, batch 1550, loss[loss=0.2248, ctc_loss=0.1498, cr_loss=0.3753, over 17039.00 frames. ], tot_loss[loss=0.2052, ctc_loss=0.1348, cr_loss=0.3524, over 3373870.65 frames. ], batch size: 56, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:34:21,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=461771.3333333333, ans=0.0 2024-09-24 08:34:25,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=461771.3333333333, ans=0.125 2024-09-24 08:34:26,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.52 vs. limit=10.0 2024-09-24 08:34:27,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=461771.3333333333, ans=0.125 2024-09-24 08:34:59,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=461864.6666666667, ans=0.125 2024-09-24 08:35:22,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=461958.0, ans=0.2 2024-09-24 08:35:29,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=461958.0, ans=0.125 2024-09-24 08:35:38,846 INFO [train.py:1198] (3/4) Epoch 26, batch 1600, loss[loss=0.2019, ctc_loss=0.1274, cr_loss=0.3728, over 17302.00 frames. ], tot_loss[loss=0.2066, ctc_loss=0.1357, cr_loss=0.3543, over 3374469.98 frames. ], batch size: 46, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:35:47,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=462004.6666666667, ans=0.0 2024-09-24 08:35:56,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=462051.3333333333, ans=0.1 2024-09-24 08:36:21,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=462098.0, ans=0.125 2024-09-24 08:36:22,548 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.046e+02 1.247e+02 1.306e+02 1.406e+02 2.052e+02, threshold=2.612e+02, percent-clipped=0.0 2024-09-24 08:36:25,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=462098.0, ans=0.125 2024-09-24 08:36:38,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=462144.6666666667, ans=0.1 2024-09-24 08:36:43,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=462191.3333333333, ans=0.125 2024-09-24 08:37:03,824 INFO [train.py:1198] (3/4) Epoch 26, batch 1650, loss[loss=0.1828, ctc_loss=0.1187, cr_loss=0.3207, over 17272.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.1349, cr_loss=0.3527, over 3371042.96 frames. ], batch size: 44, lr: 4.60e-03, grad_scale: 32.0 2024-09-24 08:37:09,552 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.79 vs. limit=22.5 2024-09-24 08:37:12,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=462238.0, ans=0.125 2024-09-24 08:37:16,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=462238.0, ans=0.07 2024-09-24 08:37:21,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=462284.6666666667, ans=0.0 2024-09-24 08:37:29,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=462284.6666666667, ans=0.0 2024-09-24 08:37:50,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=462378.0, ans=0.1 2024-09-24 08:37:52,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=462378.0, ans=0.2 2024-09-24 08:37:59,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=462378.0, ans=0.125 2024-09-24 08:38:23,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=462424.6666666667, ans=0.125 2024-09-24 08:38:26,374 INFO [train.py:1198] (3/4) Epoch 26, batch 1700, loss[loss=0.1952, ctc_loss=0.1272, cr_loss=0.34, over 17356.00 frames. ], tot_loss[loss=0.2048, ctc_loss=0.1344, cr_loss=0.3519, over 3380825.57 frames. ], batch size: 48, lr: 4.60e-03, grad_scale: 32.0 2024-09-24 08:38:30,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=462471.3333333333, ans=0.1 2024-09-24 08:39:10,601 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.222e+02 1.318e+02 1.444e+02 2.333e+02, threshold=2.637e+02, percent-clipped=0.0 2024-09-24 08:39:14,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=462564.6666666667, ans=0.025 2024-09-24 08:39:33,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=462658.0, ans=0.2 2024-09-24 08:39:40,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=22.5 2024-09-24 08:39:48,987 INFO [train.py:1198] (3/4) Epoch 26, batch 1750, loss[loss=0.2061, ctc_loss=0.1356, cr_loss=0.3527, over 17169.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1347, cr_loss=0.3521, over 3375914.71 frames. ], batch size: 45, lr: 4.60e-03, grad_scale: 32.0 2024-09-24 08:39:54,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=462704.6666666667, ans=0.125 2024-09-24 08:39:57,759 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.84 vs. limit=10.0 2024-09-24 08:39:59,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.37 vs. limit=15.0 2024-09-24 08:41:04,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2024-09-24 08:41:08,749 INFO [train.py:1198] (3/4) Epoch 26, batch 1800, loss[loss=0.2106, ctc_loss=0.1356, cr_loss=0.3749, over 16898.00 frames. ], tot_loss[loss=0.2061, ctc_loss=0.1355, cr_loss=0.3534, over 3366525.59 frames. ], batch size: 58, lr: 4.60e-03, grad_scale: 32.0 2024-09-24 08:41:18,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=12.0 2024-09-24 08:41:22,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.86 vs. limit=10.0 2024-09-24 08:41:27,908 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:41:46,789 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:41:55,219 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.293e+02 1.406e+02 1.567e+02 1.918e+02, threshold=2.813e+02, percent-clipped=0.0 2024-09-24 08:42:03,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.29 vs. limit=22.5 2024-09-24 08:42:13,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=463078.0, ans=0.2 2024-09-24 08:42:33,473 INFO [train.py:1198] (3/4) Epoch 26, batch 1850, loss[loss=0.2152, ctc_loss=0.1438, cr_loss=0.3571, over 17034.00 frames. ], tot_loss[loss=0.2056, ctc_loss=0.1351, cr_loss=0.3524, over 3361723.44 frames. ], batch size: 56, lr: 4.60e-03, grad_scale: 16.0 2024-09-24 08:42:39,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=15.0 2024-09-24 08:42:46,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.22 vs. limit=15.0 2024-09-24 08:42:51,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=463218.0, ans=0.025 2024-09-24 08:42:52,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=463218.0, ans=0.125 2024-09-24 08:43:23,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2024-09-24 08:43:25,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=463311.3333333333, ans=0.2 2024-09-24 08:43:27,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=463311.3333333333, ans=0.07 2024-09-24 08:43:49,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=463358.0, ans=0.035 2024-09-24 08:43:55,586 INFO [train.py:1198] (3/4) Epoch 26, batch 1900, loss[loss=0.2171, ctc_loss=0.1404, cr_loss=0.3832, over 17292.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.1357, cr_loss=0.3536, over 3355223.75 frames. ], batch size: 49, lr: 4.60e-03, grad_scale: 16.0 2024-09-24 08:44:00,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2024-09-24 08:44:31,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=463498.0, ans=0.0 2024-09-24 08:44:34,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=463498.0, ans=0.0 2024-09-24 08:44:41,027 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.057e+02 1.265e+02 1.341e+02 1.479e+02 3.030e+02, threshold=2.683e+02, percent-clipped=1.0 2024-09-24 08:45:18,012 INFO [train.py:1198] (3/4) Epoch 26, batch 1950, loss[loss=0.1836, ctc_loss=0.1184, cr_loss=0.3262, over 16960.00 frames. ], tot_loss[loss=0.2049, ctc_loss=0.1346, cr_loss=0.3516, over 3361101.98 frames. ], batch size: 42, lr: 4.60e-03, grad_scale: 16.0 2024-09-24 08:45:26,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=463638.0, ans=0.0 2024-09-24 08:45:50,284 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:46:09,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=463778.0, ans=0.125 2024-09-24 08:46:12,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=463778.0, ans=0.05 2024-09-24 08:46:20,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-24 08:46:40,694 INFO [train.py:1198] (3/4) Epoch 26, batch 2000, loss[loss=0.2385, ctc_loss=0.1607, cr_loss=0.3888, over 17171.00 frames. ], tot_loss[loss=0.2033, ctc_loss=0.1334, cr_loss=0.3495, over 3367129.13 frames. ], batch size: 48, lr: 4.60e-03, grad_scale: 32.0 2024-09-24 08:47:12,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=463918.0, ans=0.125 2024-09-24 08:47:23,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=463964.6666666667, ans=0.025 2024-09-24 08:47:26,725 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.293e+02 1.366e+02 1.489e+02 2.036e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-24 08:47:28,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=463964.6666666667, ans=0.0 2024-09-24 08:47:39,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=464011.3333333333, ans=0.125 2024-09-24 08:47:57,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=464058.0, ans=0.125 2024-09-24 08:48:06,090 INFO [train.py:1198] (3/4) Epoch 26, batch 2050, loss[loss=0.2122, ctc_loss=0.1379, cr_loss=0.3714, over 17102.00 frames. ], tot_loss[loss=0.2041, ctc_loss=0.134, cr_loss=0.3508, over 3371357.33 frames. ], batch size: 49, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:48:08,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=464104.6666666667, ans=0.125 2024-09-24 08:48:25,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=464151.3333333333, ans=0.0 2024-09-24 08:48:38,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=464198.0, ans=0.1 2024-09-24 08:48:39,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=12.0 2024-09-24 08:48:53,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=464244.6666666667, ans=0.0 2024-09-24 08:49:11,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.82 vs. limit=15.0 2024-09-24 08:49:21,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=464291.3333333333, ans=0.5 2024-09-24 08:49:28,857 INFO [train.py:1198] (3/4) Epoch 26, batch 2100, loss[loss=0.2301, ctc_loss=0.1532, cr_loss=0.3846, over 17107.00 frames. ], tot_loss[loss=0.2037, ctc_loss=0.1337, cr_loss=0.3499, over 3363634.50 frames. ], batch size: 49, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:49:38,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=464338.0, ans=0.125 2024-09-24 08:49:48,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=464384.6666666667, ans=0.02 2024-09-24 08:50:12,345 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.252e+02 1.369e+02 1.506e+02 2.347e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-24 08:50:14,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=464431.3333333333, ans=0.2 2024-09-24 08:50:20,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=464478.0, ans=0.2 2024-09-24 08:50:26,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=464478.0, ans=0.07 2024-09-24 08:50:30,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.59 vs. limit=15.0 2024-09-24 08:50:45,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2024-09-24 08:50:48,961 INFO [train.py:1198] (3/4) Epoch 26, batch 2150, loss[loss=0.192, ctc_loss=0.1282, cr_loss=0.3192, over 17170.00 frames. ], tot_loss[loss=0.2038, ctc_loss=0.1338, cr_loss=0.3501, over 3370465.72 frames. ], batch size: 45, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:51:08,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=464618.0, ans=0.2 2024-09-24 08:51:09,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=464618.0, ans=0.125 2024-09-24 08:51:17,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=464618.0, ans=0.0 2024-09-24 08:51:20,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=464618.0, ans=0.2 2024-09-24 08:51:39,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=464711.3333333333, ans=0.0 2024-09-24 08:52:12,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=464804.6666666667, ans=0.0 2024-09-24 08:52:13,763 INFO [train.py:1198] (3/4) Epoch 26, batch 2200, loss[loss=0.2339, ctc_loss=0.1546, cr_loss=0.3965, over 17016.00 frames. ], tot_loss[loss=0.2041, ctc_loss=0.1339, cr_loss=0.3508, over 3372371.14 frames. ], batch size: 53, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:52:13,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=464804.6666666667, ans=0.1 2024-09-24 08:52:26,753 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:52:29,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=464851.3333333333, ans=0.125 2024-09-24 08:52:32,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-09-24 08:52:56,973 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.271e+02 1.361e+02 1.483e+02 2.576e+02, threshold=2.723e+02, percent-clipped=0.0 2024-09-24 08:53:14,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=464944.6666666667, ans=0.125 2024-09-24 08:53:25,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=464991.3333333333, ans=0.125 2024-09-24 08:53:27,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=464991.3333333333, ans=0.2 2024-09-24 08:53:35,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=465038.0, ans=22.5 2024-09-24 08:53:36,466 INFO [train.py:1198] (3/4) Epoch 26, batch 2250, loss[loss=0.2199, ctc_loss=0.1498, cr_loss=0.3505, over 17041.00 frames. ], tot_loss[loss=0.2045, ctc_loss=0.1342, cr_loss=0.3511, over 3377372.07 frames. ], batch size: 56, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:53:39,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=465038.0, ans=0.125 2024-09-24 08:54:08,718 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.12 vs. limit=10.0 2024-09-24 08:54:28,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=465178.0, ans=0.125 2024-09-24 08:54:49,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=465224.6666666667, ans=0.125 2024-09-24 08:54:52,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=465224.6666666667, ans=0.0 2024-09-24 08:54:56,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=15.0 2024-09-24 08:54:57,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=465271.3333333333, ans=0.125 2024-09-24 08:54:59,002 INFO [train.py:1198] (3/4) Epoch 26, batch 2300, loss[loss=0.2474, ctc_loss=0.1672, cr_loss=0.4011, over 16766.00 frames. ], tot_loss[loss=0.2041, ctc_loss=0.134, cr_loss=0.3505, over 3375205.16 frames. ], batch size: 61, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:55:20,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=465318.0, ans=0.125 2024-09-24 08:55:32,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=465364.6666666667, ans=0.125 2024-09-24 08:55:40,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=22.5 2024-09-24 08:55:42,651 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.269e+02 1.366e+02 1.477e+02 2.036e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-24 08:56:09,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.41 vs. limit=15.0 2024-09-24 08:56:22,147 INFO [train.py:1198] (3/4) Epoch 26, batch 2350, loss[loss=0.1788, ctc_loss=0.1144, cr_loss=0.3219, over 16954.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1347, cr_loss=0.352, over 3363221.01 frames. ], batch size: 42, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:56:33,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=465504.6666666667, ans=0.2 2024-09-24 08:56:36,181 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2024-09-24 08:56:45,102 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=2.708e-03 2024-09-24 08:57:00,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=465598.0, ans=0.125 2024-09-24 08:57:00,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=465598.0, ans=0.0 2024-09-24 08:57:11,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=465644.6666666667, ans=0.0 2024-09-24 08:57:19,732 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:57:40,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=465691.3333333333, ans=0.1 2024-09-24 08:57:41,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.84 vs. limit=10.0 2024-09-24 08:57:45,359 INFO [train.py:1198] (3/4) Epoch 26, batch 2400, loss[loss=0.2098, ctc_loss=0.1391, cr_loss=0.3534, over 17225.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.135, cr_loss=0.3521, over 3365312.19 frames. ], batch size: 50, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:58:07,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=465784.6666666667, ans=0.0 2024-09-24 08:58:29,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=465831.3333333333, ans=0.125 2024-09-24 08:58:32,543 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.217e+02 1.274e+02 1.391e+02 1.998e+02, threshold=2.548e+02, percent-clipped=0.0 2024-09-24 08:58:34,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=465878.0, ans=0.125 2024-09-24 08:58:44,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2024-09-24 08:58:45,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=465878.0, ans=0.0 2024-09-24 08:59:04,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=465924.6666666667, ans=0.0 2024-09-24 08:59:10,331 INFO [train.py:1198] (3/4) Epoch 26, batch 2450, loss[loss=0.1758, ctc_loss=0.1127, cr_loss=0.3155, over 17037.00 frames. ], tot_loss[loss=0.2057, ctc_loss=0.1352, cr_loss=0.3523, over 3363152.63 frames. ], batch size: 39, lr: 4.59e-03, grad_scale: 16.0 2024-09-24 08:59:18,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=465971.3333333333, ans=0.0 2024-09-24 08:59:44,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=466064.6666666667, ans=0.125 2024-09-24 08:59:56,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=466111.3333333333, ans=0.2 2024-09-24 09:00:30,250 INFO [train.py:1198] (3/4) Epoch 26, batch 2500, loss[loss=0.212, ctc_loss=0.1449, cr_loss=0.3353, over 17223.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1354, cr_loss=0.3522, over 3355965.20 frames. ], batch size: 50, lr: 4.58e-03, grad_scale: 16.0 2024-09-24 09:00:32,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=466204.6666666667, ans=0.04949747468305833 2024-09-24 09:00:53,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=466251.3333333333, ans=0.0 2024-09-24 09:01:02,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=466298.0, ans=0.025 2024-09-24 09:01:17,879 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.260e+02 1.339e+02 1.444e+02 2.110e+02, threshold=2.678e+02, percent-clipped=0.0 2024-09-24 09:01:28,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=15.0 2024-09-24 09:01:51,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=466391.3333333333, ans=0.125 2024-09-24 09:01:56,198 INFO [train.py:1198] (3/4) Epoch 26, batch 2550, loss[loss=0.1598, ctc_loss=0.1008, cr_loss=0.2948, over 16987.00 frames. ], tot_loss[loss=0.2057, ctc_loss=0.1352, cr_loss=0.3523, over 3359381.17 frames. ], batch size: 42, lr: 4.58e-03, grad_scale: 16.0 2024-09-24 09:02:28,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=466531.3333333333, ans=0.125 2024-09-24 09:03:07,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=466624.6666666667, ans=0.07 2024-09-24 09:03:12,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466624.6666666667, ans=0.1 2024-09-24 09:03:14,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=466624.6666666667, ans=0.025 2024-09-24 09:03:19,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.95 vs. limit=10.0 2024-09-24 09:03:20,624 INFO [train.py:1198] (3/4) Epoch 26, batch 2600, loss[loss=0.1912, ctc_loss=0.1258, cr_loss=0.327, over 17168.00 frames. ], tot_loss[loss=0.2059, ctc_loss=0.1354, cr_loss=0.3525, over 3362150.26 frames. ], batch size: 45, lr: 4.58e-03, grad_scale: 16.0 2024-09-24 09:04:00,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466764.6666666667, ans=0.1 2024-09-24 09:04:07,743 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.265e+02 1.355e+02 1.499e+02 2.594e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-24 09:04:19,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=466811.3333333333, ans=0.2 2024-09-24 09:04:42,888 INFO [train.py:1198] (3/4) Epoch 26, batch 2650, loss[loss=0.2128, ctc_loss=0.1426, cr_loss=0.3509, over 16453.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1353, cr_loss=0.3523, over 3357307.58 frames. ], batch size: 66, lr: 4.58e-03, grad_scale: 16.0 2024-09-24 09:05:03,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2024-09-24 09:05:47,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=467091.3333333333, ans=0.0 2024-09-24 09:06:04,512 INFO [train.py:1198] (3/4) Epoch 26, batch 2700, loss[loss=0.2426, ctc_loss=0.1602, cr_loss=0.4118, over 17213.00 frames. ], tot_loss[loss=0.2063, ctc_loss=0.1356, cr_loss=0.3531, over 3354617.06 frames. ], batch size: 55, lr: 4.58e-03, grad_scale: 16.0 2024-09-24 09:06:07,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=467138.0, ans=0.0 2024-09-24 09:06:31,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=467184.6666666667, ans=0.125 2024-09-24 09:06:36,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=467231.3333333333, ans=0.09899494936611666 2024-09-24 09:06:42,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=467231.3333333333, ans=0.1 2024-09-24 09:06:51,909 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.262e+02 1.341e+02 1.444e+02 3.624e+02, threshold=2.682e+02, percent-clipped=1.0 2024-09-24 09:06:52,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2024-09-24 09:07:27,272 INFO [train.py:1198] (3/4) Epoch 26, batch 2750, loss[loss=0.2487, ctc_loss=0.169, cr_loss=0.3985, over 16998.00 frames. ], tot_loss[loss=0.2066, ctc_loss=0.1358, cr_loss=0.3536, over 3358212.62 frames. ], batch size: 53, lr: 4.58e-03, grad_scale: 16.0 2024-09-24 09:07:29,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=467371.3333333333, ans=0.125 2024-09-24 09:07:33,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=467371.3333333333, ans=0.125 2024-09-24 09:07:37,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=467371.3333333333, ans=0.1 2024-09-24 09:07:37,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2024-09-24 09:08:16,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=467511.3333333333, ans=0.1 2024-09-24 09:08:20,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2024-09-24 09:08:29,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=467511.3333333333, ans=0.1 2024-09-24 09:08:35,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=467558.0, ans=0.0 2024-09-24 09:08:37,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=467558.0, ans=0.0 2024-09-24 09:08:52,544 INFO [train.py:1198] (3/4) Epoch 26, batch 2800, loss[loss=0.2024, ctc_loss=0.1315, cr_loss=0.354, over 17029.00 frames. ], tot_loss[loss=0.2076, ctc_loss=0.1366, cr_loss=0.3548, over 3353706.70 frames. ], batch size: 44, lr: 4.58e-03, grad_scale: 32.0 2024-09-24 09:08:59,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.58 vs. limit=6.0 2024-09-24 09:09:24,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.83 vs. limit=10.0 2024-09-24 09:09:26,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=467698.0, ans=0.1 2024-09-24 09:09:30,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=467698.0, ans=0.0 2024-09-24 09:09:30,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=467698.0, ans=0.125 2024-09-24 09:09:37,849 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.261e+02 1.357e+02 1.478e+02 1.924e+02, threshold=2.714e+02, percent-clipped=0.0 2024-09-24 09:10:11,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=15.0 2024-09-24 09:10:12,862 INFO [train.py:1198] (3/4) Epoch 26, batch 2850, loss[loss=0.2152, ctc_loss=0.1397, cr_loss=0.3775, over 16837.00 frames. ], tot_loss[loss=0.2059, ctc_loss=0.1354, cr_loss=0.3524, over 3360792.21 frames. ], batch size: 58, lr: 4.58e-03, grad_scale: 32.0 2024-09-24 09:10:32,470 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 09:10:40,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=467884.6666666667, ans=0.125 2024-09-24 09:10:43,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=467931.3333333333, ans=0.125 2024-09-24 09:11:06,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=467978.0, ans=0.0 2024-09-24 09:11:06,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=467978.0, ans=0.1 2024-09-24 09:11:08,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=467978.0, ans=0.0 2024-09-24 09:11:21,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=468024.6666666667, ans=0.125 2024-09-24 09:11:35,006 INFO [train.py:1198] (3/4) Epoch 26, batch 2900, loss[loss=0.2118, ctc_loss=0.1403, cr_loss=0.3573, over 17017.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1349, cr_loss=0.351, over 3356196.12 frames. ], batch size: 51, lr: 4.58e-03, grad_scale: 32.0 2024-09-24 09:11:56,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468118.0, ans=0.1 2024-09-24 09:12:06,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=468118.0, ans=0.1 2024-09-24 09:12:19,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=468164.6666666667, ans=0.0 2024-09-24 09:12:22,358 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.321e+02 1.382e+02 1.512e+02 2.485e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-24 09:12:24,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=468211.3333333333, ans=0.125 2024-09-24 09:12:35,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=468211.3333333333, ans=0.1 2024-09-24 09:12:40,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=468258.0, ans=0.2 2024-09-24 09:12:53,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=468258.0, ans=0.5 2024-09-24 09:13:00,443 INFO [train.py:1198] (3/4) Epoch 26, batch 2950, loss[loss=0.2024, ctc_loss=0.1291, cr_loss=0.3668, over 17015.00 frames. ], tot_loss[loss=0.205, ctc_loss=0.1348, cr_loss=0.351, over 3355511.46 frames. ], batch size: 44, lr: 4.57e-03, grad_scale: 32.0 2024-09-24 09:13:00,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=468304.6666666667, ans=0.05 2024-09-24 09:13:02,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.66 vs. limit=10.0 2024-09-24 09:13:15,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=468351.3333333333, ans=0.125 2024-09-24 09:13:23,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=12.0 2024-09-24 09:13:31,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.56 vs. limit=15.0 2024-09-24 09:13:50,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.92 vs. limit=10.0 2024-09-24 09:13:57,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=468444.6666666667, ans=0.1 2024-09-24 09:14:23,228 INFO [train.py:1198] (3/4) Epoch 26, batch 3000, loss[loss=0.1877, ctc_loss=0.1217, cr_loss=0.3297, over 17284.00 frames. ], tot_loss[loss=0.2046, ctc_loss=0.1345, cr_loss=0.3502, over 3352385.26 frames. ], batch size: 42, lr: 4.57e-03, grad_scale: 32.0 2024-09-24 09:14:23,229 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 09:14:38,566 INFO [train.py:1230] (3/4) Epoch 26, validation: loss=0.03742, ctc_loss=0.03742, cr_loss=8.706e-15, over 944034.00 frames. 2024-09-24 09:14:38,567 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 09:14:42,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=468538.0, ans=0.125 2024-09-24 09:14:46,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=468538.0, ans=0.1 2024-09-24 09:15:13,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=468631.3333333333, ans=0.025 2024-09-24 09:15:22,386 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.270e+02 1.354e+02 1.456e+02 4.080e+02, threshold=2.708e+02, percent-clipped=1.0 2024-09-24 09:15:23,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=15.0 2024-09-24 09:15:24,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=468678.0, ans=0.2 2024-09-24 09:15:32,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=468678.0, ans=0.0 2024-09-24 09:15:43,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=468724.6666666667, ans=0.0 2024-09-24 09:15:56,791 INFO [train.py:1198] (3/4) Epoch 26, batch 3050, loss[loss=0.2454, ctc_loss=0.1697, cr_loss=0.3788, over 11900.00 frames. ], tot_loss[loss=0.2068, ctc_loss=0.1362, cr_loss=0.3531, over 3346279.45 frames. ], batch size: 124, lr: 4.57e-03, grad_scale: 32.0 2024-09-24 09:16:39,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=468864.6666666667, ans=0.025 2024-09-24 09:17:01,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=468958.0, ans=0.2 2024-09-24 09:17:07,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=468958.0, ans=0.0 2024-09-24 09:17:14,928 INFO [train.py:1198] (3/4) Epoch 26, batch 3100, loss[loss=0.1917, ctc_loss=0.1248, cr_loss=0.3341, over 17303.00 frames. ], tot_loss[loss=0.2079, ctc_loss=0.137, cr_loss=0.3548, over 3350656.27 frames. ], batch size: 49, lr: 4.57e-03, grad_scale: 32.0 2024-09-24 09:17:16,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=469004.6666666667, ans=0.0 2024-09-24 09:17:16,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=469004.6666666667, ans=0.2 2024-09-24 09:17:16,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=469004.6666666667, ans=0.0 2024-09-24 09:17:23,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=469004.6666666667, ans=0.2 2024-09-24 09:18:01,342 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.241e+02 1.330e+02 1.444e+02 2.208e+02, threshold=2.660e+02, percent-clipped=0.0 2024-09-24 09:18:01,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=469098.0, ans=0.125 2024-09-24 09:18:01,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=469098.0, ans=0.125 2024-09-24 09:18:06,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=469144.6666666667, ans=0.125 2024-09-24 09:18:14,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.76 vs. limit=15.0 2024-09-24 09:18:35,760 INFO [train.py:1198] (3/4) Epoch 26, batch 3150, loss[loss=0.2196, ctc_loss=0.1488, cr_loss=0.3541, over 15930.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.1364, cr_loss=0.3535, over 3353046.98 frames. ], batch size: 74, lr: 4.57e-03, grad_scale: 32.0 2024-09-24 09:19:16,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=469331.3333333333, ans=0.125 2024-09-24 09:19:36,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=469378.0, ans=0.1 2024-09-24 09:19:36,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.47 vs. limit=12.0 2024-09-24 09:19:41,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=469424.6666666667, ans=0.0 2024-09-24 09:19:48,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=469424.6666666667, ans=0.125 2024-09-24 09:19:56,706 INFO [train.py:1198] (3/4) Epoch 26, batch 3200, loss[loss=0.2314, ctc_loss=0.1617, cr_loss=0.3484, over 12018.00 frames. ], tot_loss[loss=0.2063, ctc_loss=0.1359, cr_loss=0.3517, over 3325327.61 frames. ], batch size: 123, lr: 4.57e-03, grad_scale: 32.0 2024-09-24 09:20:11,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=469518.0, ans=0.07 2024-09-24 09:20:36,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=469564.6666666667, ans=0.95 2024-09-24 09:20:44,069 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.250e+02 1.395e+02 1.507e+02 2.152e+02, threshold=2.790e+02, percent-clipped=0.0 2024-09-24 09:20:55,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=469611.3333333333, ans=0.5 2024-09-24 09:21:00,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2024-09-24 09:21:04,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=469658.0, ans=0.025 2024-09-24 09:21:05,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.17 vs. limit=15.0 2024-09-24 09:21:10,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=469658.0, ans=0.0 2024-09-24 09:21:15,231 INFO [train.py:1198] (3/4) Epoch 26, batch 3250, loss[loss=0.2579, ctc_loss=0.1793, cr_loss=0.3928, over 12125.00 frames. ], tot_loss[loss=0.2063, ctc_loss=0.136, cr_loss=0.3517, over 3315134.54 frames. ], batch size: 124, lr: 4.57e-03, grad_scale: 16.0 2024-09-24 09:21:18,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=469704.6666666667, ans=0.025 2024-09-24 09:21:31,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=12.0 2024-09-24 09:21:42,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=469751.3333333333, ans=0.125 2024-09-24 09:21:47,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2024-09-24 09:21:48,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=469798.0, ans=0.125 2024-09-24 09:22:08,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=469844.6666666667, ans=0.0 2024-09-24 09:22:17,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2024-09-24 09:22:28,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.04 vs. limit=22.5 2024-09-24 09:22:35,466 INFO [train.py:1198] (3/4) Epoch 26, batch 3300, loss[loss=0.1968, ctc_loss=0.1292, cr_loss=0.3381, over 17189.00 frames. ], tot_loss[loss=0.2061, ctc_loss=0.1357, cr_loss=0.3521, over 3318930.34 frames. ], batch size: 45, lr: 4.57e-03, grad_scale: 16.0 2024-09-24 09:22:46,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=469938.0, ans=0.0 2024-09-24 09:23:24,578 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.271e+02 1.364e+02 1.523e+02 2.164e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-24 09:23:32,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=470078.0, ans=0.125 2024-09-24 09:23:38,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=470124.6666666667, ans=0.05 2024-09-24 09:23:41,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.94 vs. limit=22.5 2024-09-24 09:23:45,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=470124.6666666667, ans=0.125 2024-09-24 09:23:47,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.39 vs. limit=10.0 2024-09-24 09:23:55,639 INFO [train.py:1198] (3/4) Epoch 26, batch 3350, loss[loss=0.1825, ctc_loss=0.1196, cr_loss=0.3146, over 17018.00 frames. ], tot_loss[loss=0.2066, ctc_loss=0.136, cr_loss=0.3528, over 3317969.57 frames. ], batch size: 44, lr: 4.57e-03, grad_scale: 16.0 2024-09-24 09:24:08,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=470171.3333333333, ans=0.125 2024-09-24 09:24:16,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=15.0 2024-09-24 09:24:31,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-09-24 09:24:36,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=470264.6666666667, ans=0.125 2024-09-24 09:24:57,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=470358.0, ans=0.0 2024-09-24 09:25:00,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=470358.0, ans=0.1 2024-09-24 09:25:14,431 INFO [train.py:1198] (3/4) Epoch 26, batch 3400, loss[loss=0.1691, ctc_loss=0.1086, cr_loss=0.3024, over 16956.00 frames. ], tot_loss[loss=0.2055, ctc_loss=0.1352, cr_loss=0.3516, over 3334777.64 frames. ], batch size: 42, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:25:21,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=470404.6666666667, ans=0.1 2024-09-24 09:25:42,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-09-24 09:26:00,407 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 09:26:01,626 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.254e+02 1.315e+02 1.422e+02 2.277e+02, threshold=2.630e+02, percent-clipped=0.0 2024-09-24 09:26:06,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=470544.6666666667, ans=0.0 2024-09-24 09:26:06,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=12.0 2024-09-24 09:26:15,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=470591.3333333333, ans=0.2 2024-09-24 09:26:22,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=470591.3333333333, ans=0.125 2024-09-24 09:26:32,735 INFO [train.py:1198] (3/4) Epoch 26, batch 3450, loss[loss=0.1862, ctc_loss=0.121, cr_loss=0.3262, over 17006.00 frames. ], tot_loss[loss=0.2046, ctc_loss=0.1345, cr_loss=0.3504, over 3340303.14 frames. ], batch size: 44, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:26:39,822 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=22.5 2024-09-24 09:26:44,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=470638.0, ans=0.125 2024-09-24 09:27:03,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=470731.3333333333, ans=0.09899494936611666 2024-09-24 09:27:08,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=470731.3333333333, ans=10.0 2024-09-24 09:27:12,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=470731.3333333333, ans=0.2 2024-09-24 09:27:28,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2024-09-24 09:27:30,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=470778.0, ans=0.0 2024-09-24 09:27:39,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.66 vs. limit=15.0 2024-09-24 09:27:53,198 INFO [train.py:1198] (3/4) Epoch 26, batch 3500, loss[loss=0.1971, ctc_loss=0.1256, cr_loss=0.3578, over 17302.00 frames. ], tot_loss[loss=0.2043, ctc_loss=0.1342, cr_loss=0.3504, over 3346854.92 frames. ], batch size: 51, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:28:01,220 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 09:28:09,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=470918.0, ans=0.1 2024-09-24 09:28:17,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.98 vs. limit=10.0 2024-09-24 09:28:32,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=470964.6666666667, ans=0.125 2024-09-24 09:28:38,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=471011.3333333333, ans=0.125 2024-09-24 09:28:39,879 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.277e+02 1.357e+02 1.514e+02 2.797e+02, threshold=2.715e+02, percent-clipped=1.0 2024-09-24 09:29:11,281 INFO [train.py:1198] (3/4) Epoch 26, batch 3550, loss[loss=0.2042, ctc_loss=0.132, cr_loss=0.3613, over 17322.00 frames. ], tot_loss[loss=0.2042, ctc_loss=0.1341, cr_loss=0.3507, over 3358159.16 frames. ], batch size: 51, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:29:19,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=471104.6666666667, ans=0.125 2024-09-24 09:29:29,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=471151.3333333333, ans=0.125 2024-09-24 09:29:33,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=471151.3333333333, ans=0.125 2024-09-24 09:29:54,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=471198.0, ans=0.035 2024-09-24 09:30:04,047 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2024-09-24 09:30:20,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.80 vs. limit=15.0 2024-09-24 09:30:21,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=471291.3333333333, ans=0.2 2024-09-24 09:30:23,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2024-09-24 09:30:32,070 INFO [train.py:1198] (3/4) Epoch 26, batch 3600, loss[loss=0.1664, ctc_loss=0.107, cr_loss=0.2973, over 17188.00 frames. ], tot_loss[loss=0.2039, ctc_loss=0.1338, cr_loss=0.3502, over 3356595.70 frames. ], batch size: 41, lr: 4.56e-03, grad_scale: 32.0 2024-09-24 09:31:01,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471431.3333333333, ans=0.1 2024-09-24 09:31:03,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=471431.3333333333, ans=0.2 2024-09-24 09:31:10,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.85 vs. limit=15.0 2024-09-24 09:31:12,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=471431.3333333333, ans=0.0 2024-09-24 09:31:14,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=471431.3333333333, ans=0.1 2024-09-24 09:31:14,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=471431.3333333333, ans=0.125 2024-09-24 09:31:20,017 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.297e+02 1.443e+02 1.634e+02 2.383e+02, threshold=2.886e+02, percent-clipped=0.0 2024-09-24 09:31:37,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=471524.6666666667, ans=0.1 2024-09-24 09:31:49,548 INFO [train.py:1198] (3/4) Epoch 26, batch 3650, loss[loss=0.2395, ctc_loss=0.1616, cr_loss=0.3891, over 15137.00 frames. ], tot_loss[loss=0.204, ctc_loss=0.1339, cr_loss=0.3505, over 3357393.32 frames. ], batch size: 89, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:31:56,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=471571.3333333333, ans=0.025 2024-09-24 09:32:04,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=471571.3333333333, ans=0.2 2024-09-24 09:32:17,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2024-09-24 09:32:18,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=471618.0, ans=0.0 2024-09-24 09:32:21,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=471664.6666666667, ans=0.0 2024-09-24 09:32:29,368 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 09:33:12,495 INFO [train.py:1198] (3/4) Epoch 26, batch 3700, loss[loss=0.201, ctc_loss=0.1303, cr_loss=0.3533, over 17198.00 frames. ], tot_loss[loss=0.2041, ctc_loss=0.1339, cr_loss=0.3509, over 3366002.87 frames. ], batch size: 41, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:33:17,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=471804.6666666667, ans=0.125 2024-09-24 09:33:31,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=471851.3333333333, ans=0.0 2024-09-24 09:34:01,194 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.253e+02 1.341e+02 1.434e+02 1.966e+02, threshold=2.682e+02, percent-clipped=0.0 2024-09-24 09:34:11,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=471944.6666666667, ans=0.1 2024-09-24 09:34:14,539 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-09-24 09:34:29,678 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 09:34:30,806 INFO [train.py:1198] (3/4) Epoch 26, batch 3750, loss[loss=0.1912, ctc_loss=0.1252, cr_loss=0.3299, over 17009.00 frames. ], tot_loss[loss=0.2049, ctc_loss=0.1346, cr_loss=0.3512, over 3342084.38 frames. ], batch size: 51, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:34:42,054 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-24 09:34:49,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=15.0 2024-09-24 09:35:06,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=472131.3333333333, ans=0.125 2024-09-24 09:35:07,794 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 09:35:36,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=472224.6666666667, ans=0.1 2024-09-24 09:35:44,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=472224.6666666667, ans=0.09899494936611666 2024-09-24 09:35:47,136 INFO [train.py:1198] (3/4) Epoch 26, batch 3800, loss[loss=0.2168, ctc_loss=0.1402, cr_loss=0.3834, over 17222.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.1358, cr_loss=0.353, over 3335697.60 frames. ], batch size: 47, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:35:55,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2024-09-24 09:36:11,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=472318.0, ans=0.125 2024-09-24 09:36:18,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-24 09:36:22,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=472364.6666666667, ans=0.025 2024-09-24 09:36:34,124 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.296e+02 1.371e+02 1.525e+02 1.900e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-24 09:36:43,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=472411.3333333333, ans=0.125 2024-09-24 09:36:48,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-09-24 09:36:49,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=472458.0, ans=0.1 2024-09-24 09:36:51,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=472458.0, ans=0.125 2024-09-24 09:37:03,759 INFO [train.py:1198] (3/4) Epoch 26, batch 3850, loss[loss=0.25, ctc_loss=0.1702, cr_loss=0.3991, over 14772.00 frames. ], tot_loss[loss=0.2092, ctc_loss=0.1381, cr_loss=0.3555, over 3282484.61 frames. ], batch size: 89, lr: 4.55e-03, grad_scale: 16.0 2024-09-24 09:37:52,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=472644.6666666667, ans=0.2 2024-09-24 09:39:03,334 INFO [train.py:1198] (3/4) Epoch 27, batch 0, loss[loss=0.2086, ctc_loss=0.138, cr_loss=0.3531, over 17013.00 frames. ], tot_loss[loss=0.2086, ctc_loss=0.138, cr_loss=0.3531, over 17013.00 frames. ], batch size: 44, lr: 4.47e-03, grad_scale: 32.0 2024-09-24 09:39:03,335 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 09:39:21,541 INFO [train.py:1230] (3/4) Epoch 27, validation: loss=0.03741, ctc_loss=0.03741, cr_loss=8.388e-15, over 944034.00 frames. 2024-09-24 09:39:21,542 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 09:40:05,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=472812.6666666667, ans=0.0 2024-09-24 09:40:10,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=472859.3333333333, ans=0.125 2024-09-24 09:40:12,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=472859.3333333333, ans=0.0 2024-09-24 09:40:18,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=472859.3333333333, ans=0.125 2024-09-24 09:40:21,652 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.362e+02 1.530e+02 1.663e+02 2.257e+02, threshold=3.060e+02, percent-clipped=0.0 2024-09-24 09:40:39,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=472906.0, ans=0.035 2024-09-24 09:40:41,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=472906.0, ans=0.2 2024-09-24 09:40:45,611 INFO [train.py:1198] (3/4) Epoch 27, batch 50, loss[loss=0.2027, ctc_loss=0.1301, cr_loss=0.3629, over 17349.00 frames. ], tot_loss[loss=0.2085, ctc_loss=0.1373, cr_loss=0.3559, over 750462.36 frames. ], batch size: 48, lr: 4.47e-03, grad_scale: 32.0 2024-09-24 09:41:22,118 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=22.5 2024-09-24 09:41:58,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=473139.3333333333, ans=0.125 2024-09-24 09:42:00,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=473139.3333333333, ans=0.125 2024-09-24 09:42:04,849 INFO [train.py:1198] (3/4) Epoch 27, batch 100, loss[loss=0.1656, ctc_loss=0.1063, cr_loss=0.2965, over 16960.00 frames. ], tot_loss[loss=0.2041, ctc_loss=0.1341, cr_loss=0.3499, over 1327356.92 frames. ], batch size: 42, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:42:13,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=473186.0, ans=0.125 2024-09-24 09:42:26,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=473232.6666666667, ans=0.125 2024-09-24 09:42:27,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2024-09-24 09:42:45,094 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 09:42:48,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2024-09-24 09:42:54,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=473326.0, ans=0.125 2024-09-24 09:43:03,946 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.054e+02 1.215e+02 1.307e+02 1.417e+02 1.891e+02, threshold=2.615e+02, percent-clipped=0.0 2024-09-24 09:43:06,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=12.0 2024-09-24 09:43:18,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=473372.6666666667, ans=0.125 2024-09-24 09:43:19,140 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=15.0 2024-09-24 09:43:25,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=473372.6666666667, ans=0.2 2024-09-24 09:43:28,115 INFO [train.py:1198] (3/4) Epoch 27, batch 150, loss[loss=0.1926, ctc_loss=0.1278, cr_loss=0.3241, over 16267.00 frames. ], tot_loss[loss=0.204, ctc_loss=0.1339, cr_loss=0.3503, over 1779940.37 frames. ], batch size: 36, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:43:41,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=473419.3333333333, ans=0.07 2024-09-24 09:43:52,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=473466.0, ans=0.07 2024-09-24 09:44:19,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=473559.3333333333, ans=0.1 2024-09-24 09:44:53,619 INFO [train.py:1198] (3/4) Epoch 27, batch 200, loss[loss=0.2016, ctc_loss=0.1332, cr_loss=0.3419, over 17026.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1347, cr_loss=0.3518, over 2130330.16 frames. ], batch size: 44, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:44:57,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=473652.6666666667, ans=0.07 2024-09-24 09:45:24,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.55 vs. limit=15.0 2024-09-24 09:45:25,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=473699.3333333333, ans=0.125 2024-09-24 09:45:52,247 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.251e+02 1.322e+02 1.422e+02 2.046e+02, threshold=2.645e+02, percent-clipped=0.0 2024-09-24 09:46:02,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=473839.3333333333, ans=0.025 2024-09-24 09:46:16,199 INFO [train.py:1198] (3/4) Epoch 27, batch 250, loss[loss=0.2468, ctc_loss=0.1656, cr_loss=0.406, over 15125.00 frames. ], tot_loss[loss=0.206, ctc_loss=0.1354, cr_loss=0.3532, over 2392533.35 frames. ], batch size: 89, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:46:37,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=473932.6666666667, ans=0.125 2024-09-24 09:46:39,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.19 vs. limit=6.0 2024-09-24 09:46:49,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=473979.3333333333, ans=0.125 2024-09-24 09:47:21,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.66 vs. limit=22.5 2024-09-24 09:47:38,637 INFO [train.py:1198] (3/4) Epoch 27, batch 300, loss[loss=0.1678, ctc_loss=0.1078, cr_loss=0.2998, over 17213.00 frames. ], tot_loss[loss=0.2044, ctc_loss=0.134, cr_loss=0.3517, over 2620999.56 frames. ], batch size: 41, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:47:44,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=474119.3333333333, ans=0.125 2024-09-24 09:47:48,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=474119.3333333333, ans=0.2 2024-09-24 09:47:49,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=474119.3333333333, ans=0.0 2024-09-24 09:48:05,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=474166.0, ans=0.2 2024-09-24 09:48:32,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=474259.3333333333, ans=0.1 2024-09-24 09:48:35,476 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.322e+02 1.414e+02 1.594e+02 2.687e+02, threshold=2.828e+02, percent-clipped=1.0 2024-09-24 09:48:36,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=22.5 2024-09-24 09:48:59,276 INFO [train.py:1198] (3/4) Epoch 27, batch 350, loss[loss=0.1913, ctc_loss=0.1264, cr_loss=0.3245, over 16336.00 frames. ], tot_loss[loss=0.2037, ctc_loss=0.1336, cr_loss=0.3504, over 2781566.53 frames. ], batch size: 36, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:49:05,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=474352.6666666667, ans=0.125 2024-09-24 09:49:29,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.28 vs. limit=22.5 2024-09-24 09:49:46,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=474446.0, ans=0.0 2024-09-24 09:49:59,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.68 vs. limit=10.0 2024-09-24 09:50:00,853 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=22.5 2024-09-24 09:50:02,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=474492.6666666667, ans=0.0 2024-09-24 09:50:07,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=474539.3333333333, ans=0.125 2024-09-24 09:50:19,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.37 vs. limit=10.0 2024-09-24 09:50:27,285 INFO [train.py:1198] (3/4) Epoch 27, batch 400, loss[loss=0.2241, ctc_loss=0.1468, cr_loss=0.3869, over 17298.00 frames. ], tot_loss[loss=0.2038, ctc_loss=0.1336, cr_loss=0.3507, over 2903180.90 frames. ], batch size: 49, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:50:57,952 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 09:51:18,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=474726.0, ans=0.125 2024-09-24 09:51:18,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=474726.0, ans=0.1 2024-09-24 09:51:20,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=474726.0, ans=0.125 2024-09-24 09:51:23,172 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.248e+02 1.347e+02 1.469e+02 2.188e+02, threshold=2.694e+02, percent-clipped=0.0 2024-09-24 09:51:30,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=474772.6666666667, ans=0.0 2024-09-24 09:51:42,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=474772.6666666667, ans=0.1 2024-09-24 09:51:44,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=474772.6666666667, ans=0.125 2024-09-24 09:51:47,498 INFO [train.py:1198] (3/4) Epoch 27, batch 450, loss[loss=0.2167, ctc_loss=0.1386, cr_loss=0.3908, over 16989.00 frames. ], tot_loss[loss=0.2033, ctc_loss=0.1331, cr_loss=0.351, over 3012546.27 frames. ], batch size: 53, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:52:05,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=474866.0, ans=0.04949747468305833 2024-09-24 09:52:09,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=474866.0, ans=0.0 2024-09-24 09:52:14,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=474866.0, ans=0.1 2024-09-24 09:52:32,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=474912.6666666667, ans=0.0 2024-09-24 09:52:52,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=475006.0, ans=0.025 2024-09-24 09:53:03,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=475006.0, ans=0.0 2024-09-24 09:53:09,779 INFO [train.py:1198] (3/4) Epoch 27, batch 500, loss[loss=0.2465, ctc_loss=0.1669, cr_loss=0.3981, over 14939.00 frames. ], tot_loss[loss=0.2027, ctc_loss=0.1328, cr_loss=0.3497, over 3100246.57 frames. ], batch size: 89, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:53:15,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=475052.6666666667, ans=0.125 2024-09-24 09:53:38,160 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.61 vs. limit=12.0 2024-09-24 09:53:58,054 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.34 vs. limit=15.0 2024-09-24 09:54:06,520 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.284e+02 1.368e+02 1.499e+02 2.424e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-24 09:54:10,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-09-24 09:54:13,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=475239.3333333333, ans=0.1 2024-09-24 09:54:26,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.74 vs. limit=15.0 2024-09-24 09:54:33,118 INFO [train.py:1198] (3/4) Epoch 27, batch 550, loss[loss=0.2023, ctc_loss=0.1328, cr_loss=0.3475, over 17170.00 frames. ], tot_loss[loss=0.2028, ctc_loss=0.1329, cr_loss=0.3492, over 3160224.52 frames. ], batch size: 55, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 09:54:35,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2024-09-24 09:54:50,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=475332.6666666667, ans=0.125 2024-09-24 09:54:56,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=475332.6666666667, ans=0.125 2024-09-24 09:55:01,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=475332.6666666667, ans=0.07 2024-09-24 09:55:10,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=475379.3333333333, ans=0.125 2024-09-24 09:55:51,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=475472.6666666667, ans=0.95 2024-09-24 09:55:57,942 INFO [train.py:1198] (3/4) Epoch 27, batch 600, loss[loss=0.2288, ctc_loss=0.1542, cr_loss=0.3732, over 16901.00 frames. ], tot_loss[loss=0.2035, ctc_loss=0.1334, cr_loss=0.3506, over 3212338.24 frames. ], batch size: 58, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 09:56:10,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=22.5 2024-09-24 09:56:20,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=12.0 2024-09-24 09:56:22,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=475566.0, ans=0.125 2024-09-24 09:56:23,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=475566.0, ans=0.125 2024-09-24 09:56:25,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=22.5 2024-09-24 09:56:33,450 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=12.0 2024-09-24 09:56:40,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=12.0 2024-09-24 09:56:53,667 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.260e+02 1.326e+02 1.393e+02 1.864e+02, threshold=2.652e+02, percent-clipped=0.0 2024-09-24 09:57:06,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=475706.0, ans=0.125 2024-09-24 09:57:13,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=475706.0, ans=0.05 2024-09-24 09:57:17,881 INFO [train.py:1198] (3/4) Epoch 27, batch 650, loss[loss=0.2002, ctc_loss=0.1325, cr_loss=0.3381, over 17343.00 frames. ], tot_loss[loss=0.2028, ctc_loss=0.1328, cr_loss=0.3497, over 3257326.06 frames. ], batch size: 48, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 09:57:29,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2024-09-24 09:57:46,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=475799.3333333333, ans=0.1 2024-09-24 09:57:55,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=475846.0, ans=0.5 2024-09-24 09:58:02,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=475846.0, ans=15.0 2024-09-24 09:58:19,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=475892.6666666667, ans=0.1 2024-09-24 09:58:25,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=475939.3333333333, ans=0.0 2024-09-24 09:58:39,737 INFO [train.py:1198] (3/4) Epoch 27, batch 700, loss[loss=0.2339, ctc_loss=0.1578, cr_loss=0.3808, over 17229.00 frames. ], tot_loss[loss=0.2033, ctc_loss=0.1333, cr_loss=0.3501, over 3282660.40 frames. ], batch size: 55, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 09:58:53,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=475986.0, ans=0.0 2024-09-24 09:59:05,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=476032.6666666667, ans=0.125 2024-09-24 09:59:29,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=476126.0, ans=0.0 2024-09-24 09:59:40,878 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.256e+02 1.371e+02 1.501e+02 2.687e+02, threshold=2.742e+02, percent-clipped=1.0 2024-09-24 10:00:04,599 INFO [train.py:1198] (3/4) Epoch 27, batch 750, loss[loss=0.2019, ctc_loss=0.1317, cr_loss=0.3509, over 17047.00 frames. ], tot_loss[loss=0.2052, ctc_loss=0.1346, cr_loss=0.3529, over 3295431.25 frames. ], batch size: 46, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 10:00:06,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=476219.3333333333, ans=0.125 2024-09-24 10:00:30,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=476266.0, ans=0.125 2024-09-24 10:00:38,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=476312.6666666667, ans=0.09899494936611666 2024-09-24 10:00:41,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=476312.6666666667, ans=0.0 2024-09-24 10:00:43,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=476312.6666666667, ans=0.0 2024-09-24 10:00:49,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476312.6666666667, ans=0.1 2024-09-24 10:01:27,182 INFO [train.py:1198] (3/4) Epoch 27, batch 800, loss[loss=0.219, ctc_loss=0.1487, cr_loss=0.3513, over 16697.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.1348, cr_loss=0.3531, over 3310198.29 frames. ], batch size: 61, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 10:01:44,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.25 vs. limit=15.0 2024-09-24 10:01:52,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=476499.3333333333, ans=0.0 2024-09-24 10:02:11,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2024-09-24 10:02:21,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476592.6666666667, ans=0.1 2024-09-24 10:02:23,975 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.235e+02 1.338e+02 1.405e+02 1.662e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-24 10:02:29,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=476592.6666666667, ans=0.125 2024-09-24 10:02:41,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=476639.3333333333, ans=0.05 2024-09-24 10:02:46,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=476639.3333333333, ans=0.125 2024-09-24 10:02:50,860 INFO [train.py:1198] (3/4) Epoch 27, batch 850, loss[loss=0.232, ctc_loss=0.1526, cr_loss=0.3968, over 17026.00 frames. ], tot_loss[loss=0.2068, ctc_loss=0.1358, cr_loss=0.3547, over 3307706.35 frames. ], batch size: 56, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 10:02:55,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=476686.0, ans=0.125 2024-09-24 10:03:02,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=476686.0, ans=0.125 2024-09-24 10:03:20,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.02 vs. limit=15.0 2024-09-24 10:03:27,720 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:04:10,719 INFO [train.py:1198] (3/4) Epoch 27, batch 900, loss[loss=0.2225, ctc_loss=0.1534, cr_loss=0.3452, over 11714.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1352, cr_loss=0.3529, over 3309866.98 frames. ], batch size: 125, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 10:04:15,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=476919.3333333333, ans=0.05 2024-09-24 10:04:15,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=476919.3333333333, ans=0.125 2024-09-24 10:04:33,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=476966.0, ans=0.1 2024-09-24 10:04:40,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=476966.0, ans=0.1 2024-09-24 10:04:47,303 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2024-09-24 10:05:14,403 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.238e+02 1.345e+02 1.503e+02 3.181e+02, threshold=2.691e+02, percent-clipped=1.0 2024-09-24 10:05:38,933 INFO [train.py:1198] (3/4) Epoch 27, batch 950, loss[loss=0.2096, ctc_loss=0.1395, cr_loss=0.3506, over 17205.00 frames. ], tot_loss[loss=0.205, ctc_loss=0.1345, cr_loss=0.3522, over 3321159.50 frames. ], batch size: 47, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 10:06:03,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=477199.3333333333, ans=0.125 2024-09-24 10:06:16,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=477246.0, ans=0.125 2024-09-24 10:06:16,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.05 vs. limit=22.5 2024-09-24 10:06:28,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=477292.6666666667, ans=0.025 2024-09-24 10:06:58,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2024-09-24 10:06:58,822 INFO [train.py:1198] (3/4) Epoch 27, batch 1000, loss[loss=0.2051, ctc_loss=0.1371, cr_loss=0.34, over 16941.00 frames. ], tot_loss[loss=0.205, ctc_loss=0.1346, cr_loss=0.352, over 3333976.31 frames. ], batch size: 42, lr: 4.44e-03, grad_scale: 8.0 2024-09-24 10:07:42,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=477479.3333333333, ans=10.0 2024-09-24 10:07:45,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=477479.3333333333, ans=0.125 2024-09-24 10:07:58,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=477526.0, ans=0.0 2024-09-24 10:08:00,890 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.228e+02 1.308e+02 1.402e+02 1.814e+02, threshold=2.617e+02, percent-clipped=0.0 2024-09-24 10:08:09,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=477572.6666666667, ans=0.09899494936611666 2024-09-24 10:08:11,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=477572.6666666667, ans=0.0 2024-09-24 10:08:22,268 INFO [train.py:1198] (3/4) Epoch 27, batch 1050, loss[loss=0.1542, ctc_loss=0.09834, cr_loss=0.2792, over 17106.00 frames. ], tot_loss[loss=0.2026, ctc_loss=0.1329, cr_loss=0.3485, over 3337451.16 frames. ], batch size: 40, lr: 4.44e-03, grad_scale: 8.0 2024-09-24 10:08:38,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=477666.0, ans=0.125 2024-09-24 10:08:43,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=477666.0, ans=0.0 2024-09-24 10:08:59,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=477712.6666666667, ans=0.0 2024-09-24 10:09:01,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2024-09-24 10:09:29,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=477806.0, ans=0.02 2024-09-24 10:09:34,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=477806.0, ans=0.2 2024-09-24 10:09:47,486 INFO [train.py:1198] (3/4) Epoch 27, batch 1100, loss[loss=0.2033, ctc_loss=0.134, cr_loss=0.3463, over 17030.00 frames. ], tot_loss[loss=0.2035, ctc_loss=0.1336, cr_loss=0.3497, over 3326391.27 frames. ], batch size: 51, lr: 4.44e-03, grad_scale: 8.0 2024-09-24 10:10:02,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=477899.3333333333, ans=0.125 2024-09-24 10:10:10,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.71 vs. limit=6.0 2024-09-24 10:10:12,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=477899.3333333333, ans=0.0 2024-09-24 10:10:13,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2024-09-24 10:10:34,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-09-24 10:10:49,361 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.279e+02 1.377e+02 1.520e+02 1.966e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-24 10:11:10,472 INFO [train.py:1198] (3/4) Epoch 27, batch 1150, loss[loss=0.1958, ctc_loss=0.127, cr_loss=0.3444, over 17108.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.1331, cr_loss=0.3489, over 3335791.14 frames. ], batch size: 49, lr: 4.44e-03, grad_scale: 8.0 2024-09-24 10:11:27,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2024-09-24 10:11:33,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2024-09-24 10:11:51,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2024-09-24 10:12:13,621 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:12:16,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=478272.6666666667, ans=0.025 2024-09-24 10:12:21,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=478272.6666666667, ans=0.1 2024-09-24 10:12:26,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=478272.6666666667, ans=0.2 2024-09-24 10:12:33,338 INFO [train.py:1198] (3/4) Epoch 27, batch 1200, loss[loss=0.2356, ctc_loss=0.1594, cr_loss=0.3807, over 16775.00 frames. ], tot_loss[loss=0.2039, ctc_loss=0.1338, cr_loss=0.3504, over 3340292.23 frames. ], batch size: 61, lr: 4.44e-03, grad_scale: 16.0 2024-09-24 10:12:37,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=22.5 2024-09-24 10:12:51,853 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=15.0 2024-09-24 10:13:15,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=478412.6666666667, ans=0.2 2024-09-24 10:13:32,437 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.237e+02 1.313e+02 1.395e+02 2.791e+02, threshold=2.627e+02, percent-clipped=1.0 2024-09-24 10:13:53,065 INFO [train.py:1198] (3/4) Epoch 27, batch 1250, loss[loss=0.2096, ctc_loss=0.1404, cr_loss=0.3459, over 17145.00 frames. ], tot_loss[loss=0.2036, ctc_loss=0.1336, cr_loss=0.3499, over 3343931.45 frames. ], batch size: 48, lr: 4.44e-03, grad_scale: 16.0 2024-09-24 10:13:53,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=478552.6666666667, ans=0.2 2024-09-24 10:13:56,903 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-09-24 10:14:20,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-09-24 10:14:34,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=478646.0, ans=0.125 2024-09-24 10:14:42,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=478646.0, ans=0.125 2024-09-24 10:15:21,234 INFO [train.py:1198] (3/4) Epoch 27, batch 1300, loss[loss=0.1837, ctc_loss=0.1164, cr_loss=0.3368, over 16288.00 frames. ], tot_loss[loss=0.203, ctc_loss=0.1331, cr_loss=0.3493, over 3353194.86 frames. ], batch size: 36, lr: 4.44e-03, grad_scale: 16.0 2024-09-24 10:15:32,829 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:15:42,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=478832.6666666667, ans=0.125 2024-09-24 10:16:06,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=478879.3333333333, ans=0.0 2024-09-24 10:16:08,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=478926.0, ans=0.0 2024-09-24 10:16:22,436 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.264e+02 1.365e+02 1.488e+02 1.950e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-24 10:16:29,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=478972.6666666667, ans=0.0 2024-09-24 10:16:41,761 INFO [train.py:1198] (3/4) Epoch 27, batch 1350, loss[loss=0.1904, ctc_loss=0.1254, cr_loss=0.3249, over 17176.00 frames. ], tot_loss[loss=0.203, ctc_loss=0.1331, cr_loss=0.3497, over 3352044.14 frames. ], batch size: 45, lr: 4.44e-03, grad_scale: 8.0 2024-09-24 10:16:59,742 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:17:15,807 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:17:18,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=479112.6666666667, ans=0.0 2024-09-24 10:17:18,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=479112.6666666667, ans=0.125 2024-09-24 10:17:24,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.93 vs. limit=15.0 2024-09-24 10:17:50,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=479206.0, ans=0.125 2024-09-24 10:17:50,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2024-09-24 10:18:04,480 INFO [train.py:1198] (3/4) Epoch 27, batch 1400, loss[loss=0.205, ctc_loss=0.1339, cr_loss=0.3553, over 16921.00 frames. ], tot_loss[loss=0.2023, ctc_loss=0.1326, cr_loss=0.3483, over 3356322.35 frames. ], batch size: 58, lr: 4.44e-03, grad_scale: 8.0 2024-09-24 10:18:54,831 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:18:55,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=479392.6666666667, ans=0.125 2024-09-24 10:18:58,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-24 10:19:04,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=479392.6666666667, ans=0.0 2024-09-24 10:19:08,018 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.257e+02 1.359e+02 1.482e+02 2.377e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-24 10:19:08,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=479392.6666666667, ans=0.125 2024-09-24 10:19:08,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2024-09-24 10:19:22,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=479439.3333333333, ans=0.0 2024-09-24 10:19:27,148 INFO [train.py:1198] (3/4) Epoch 27, batch 1450, loss[loss=0.1907, ctc_loss=0.1262, cr_loss=0.3225, over 17227.00 frames. ], tot_loss[loss=0.2036, ctc_loss=0.1335, cr_loss=0.3501, over 3350402.92 frames. ], batch size: 50, lr: 4.43e-03, grad_scale: 8.0 2024-09-24 10:19:53,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=479532.6666666667, ans=0.1 2024-09-24 10:20:13,317 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=22.5 2024-09-24 10:20:23,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=479626.0, ans=0.1 2024-09-24 10:20:25,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=479626.0, ans=0.0 2024-09-24 10:20:30,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=479626.0, ans=0.125 2024-09-24 10:20:43,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-09-24 10:20:52,655 INFO [train.py:1198] (3/4) Epoch 27, batch 1500, loss[loss=0.223, ctc_loss=0.1468, cr_loss=0.3811, over 15959.00 frames. ], tot_loss[loss=0.2037, ctc_loss=0.1337, cr_loss=0.3502, over 3352952.62 frames. ], batch size: 74, lr: 4.43e-03, grad_scale: 8.0 2024-09-24 10:21:11,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=12.0 2024-09-24 10:21:14,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=479766.0, ans=0.125 2024-09-24 10:21:30,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=479812.6666666667, ans=0.0 2024-09-24 10:21:35,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=479812.6666666667, ans=0.2 2024-09-24 10:21:43,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=479859.3333333333, ans=0.025 2024-09-24 10:21:54,048 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.289e+02 1.371e+02 1.496e+02 2.046e+02, threshold=2.742e+02, percent-clipped=0.0 2024-09-24 10:22:13,081 INFO [train.py:1198] (3/4) Epoch 27, batch 1550, loss[loss=0.2111, ctc_loss=0.1388, cr_loss=0.3612, over 17349.00 frames. ], tot_loss[loss=0.2048, ctc_loss=0.1346, cr_loss=0.3512, over 3349881.00 frames. ], batch size: 48, lr: 4.43e-03, grad_scale: 8.0 2024-09-24 10:22:27,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=479999.3333333333, ans=0.125 2024-09-24 10:22:54,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=480046.0, ans=0.1 2024-09-24 10:23:36,297 INFO [train.py:1198] (3/4) Epoch 27, batch 1600, loss[loss=0.1885, ctc_loss=0.1211, cr_loss=0.3374, over 17245.00 frames. ], tot_loss[loss=0.2039, ctc_loss=0.1339, cr_loss=0.3501, over 3353705.23 frames. ], batch size: 44, lr: 4.43e-03, grad_scale: 16.0 2024-09-24 10:23:41,394 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:24:26,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480326.0, ans=0.1 2024-09-24 10:24:41,898 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.022e+02 1.244e+02 1.330e+02 1.456e+02 2.026e+02, threshold=2.660e+02, percent-clipped=0.0 2024-09-24 10:24:55,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2024-09-24 10:25:03,407 INFO [train.py:1198] (3/4) Epoch 27, batch 1650, loss[loss=0.1821, ctc_loss=0.1199, cr_loss=0.3111, over 17173.00 frames. ], tot_loss[loss=0.2049, ctc_loss=0.1346, cr_loss=0.3519, over 3364276.44 frames. ], batch size: 45, lr: 4.43e-03, grad_scale: 16.0 2024-09-24 10:25:21,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=480466.0, ans=0.125 2024-09-24 10:25:29,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=480466.0, ans=0.125 2024-09-24 10:25:59,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=480559.3333333333, ans=0.07 2024-09-24 10:26:11,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480606.0, ans=0.1 2024-09-24 10:26:23,699 INFO [train.py:1198] (3/4) Epoch 27, batch 1700, loss[loss=0.2009, ctc_loss=0.1298, cr_loss=0.3558, over 17204.00 frames. ], tot_loss[loss=0.2049, ctc_loss=0.1344, cr_loss=0.3525, over 3368838.74 frames. ], batch size: 47, lr: 4.43e-03, grad_scale: 16.0 2024-09-24 10:26:36,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=480652.6666666667, ans=0.2 2024-09-24 10:26:39,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-09-24 10:26:49,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=480699.3333333333, ans=0.125 2024-09-24 10:26:51,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.54 vs. limit=15.0 2024-09-24 10:26:57,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=480746.0, ans=0.125 2024-09-24 10:26:59,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=480746.0, ans=0.1 2024-09-24 10:27:15,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=480792.6666666667, ans=0.0 2024-09-24 10:27:26,753 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.231e+02 1.318e+02 1.430e+02 1.905e+02, threshold=2.636e+02, percent-clipped=0.0 2024-09-24 10:27:35,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2024-09-24 10:27:44,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=480886.0, ans=0.2 2024-09-24 10:27:45,876 INFO [train.py:1198] (3/4) Epoch 27, batch 1750, loss[loss=0.1924, ctc_loss=0.1254, cr_loss=0.3352, over 17258.00 frames. ], tot_loss[loss=0.2048, ctc_loss=0.1342, cr_loss=0.3527, over 3373505.76 frames. ], batch size: 44, lr: 4.43e-03, grad_scale: 16.0 2024-09-24 10:27:56,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2024-09-24 10:28:02,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=22.5 2024-09-24 10:28:03,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=480932.6666666667, ans=0.025 2024-09-24 10:28:05,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=480932.6666666667, ans=0.2 2024-09-24 10:28:23,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.78 vs. limit=22.5 2024-09-24 10:28:31,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2024-09-24 10:28:53,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=481072.6666666667, ans=0.07 2024-09-24 10:28:55,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2024-09-24 10:29:08,835 INFO [train.py:1198] (3/4) Epoch 27, batch 1800, loss[loss=0.2092, ctc_loss=0.1369, cr_loss=0.3617, over 17245.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1341, cr_loss=0.3529, over 3379898.62 frames. ], batch size: 47, lr: 4.43e-03, grad_scale: 16.0 2024-09-24 10:29:15,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=481119.3333333333, ans=0.2 2024-09-24 10:30:14,559 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.286e+02 1.379e+02 1.526e+02 2.061e+02, threshold=2.757e+02, percent-clipped=0.0 2024-09-24 10:30:16,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2024-09-24 10:30:33,714 INFO [train.py:1198] (3/4) Epoch 27, batch 1850, loss[loss=0.1922, ctc_loss=0.1278, cr_loss=0.3217, over 16063.00 frames. ], tot_loss[loss=0.2039, ctc_loss=0.1337, cr_loss=0.3512, over 3366662.98 frames. ], batch size: 74, lr: 4.43e-03, grad_scale: 16.0 2024-09-24 10:30:37,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=481352.6666666667, ans=0.125 2024-09-24 10:30:40,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=481352.6666666667, ans=0.0 2024-09-24 10:31:33,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=481492.6666666667, ans=0.025 2024-09-24 10:31:38,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=481539.3333333333, ans=0.05 2024-09-24 10:31:53,596 INFO [train.py:1198] (3/4) Epoch 27, batch 1900, loss[loss=0.1478, ctc_loss=0.09658, cr_loss=0.2563, over 17025.00 frames. ], tot_loss[loss=0.2028, ctc_loss=0.1329, cr_loss=0.3496, over 3375238.78 frames. ], batch size: 39, lr: 4.43e-03, grad_scale: 16.0 2024-09-24 10:32:02,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=481586.0, ans=0.2 2024-09-24 10:32:28,749 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:32:48,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=481726.0, ans=0.0 2024-09-24 10:32:54,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=481726.0, ans=0.2 2024-09-24 10:32:57,447 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.257e+02 1.350e+02 1.464e+02 2.291e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-24 10:33:03,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481772.6666666667, ans=0.1 2024-09-24 10:33:16,472 INFO [train.py:1198] (3/4) Epoch 27, batch 1950, loss[loss=0.1962, ctc_loss=0.1293, cr_loss=0.3344, over 17260.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.1329, cr_loss=0.3497, over 3378982.24 frames. ], batch size: 44, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:33:20,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481819.3333333333, ans=0.1 2024-09-24 10:33:21,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.19 vs. limit=15.0 2024-09-24 10:33:36,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=481866.0, ans=0.2 2024-09-24 10:33:50,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481912.6666666667, ans=0.1 2024-09-24 10:33:52,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=481912.6666666667, ans=0.2 2024-09-24 10:33:52,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=481912.6666666667, ans=0.125 2024-09-24 10:34:07,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=481959.3333333333, ans=0.2 2024-09-24 10:34:14,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=481959.3333333333, ans=0.125 2024-09-24 10:34:34,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482006.0, ans=0.1 2024-09-24 10:34:42,091 INFO [train.py:1198] (3/4) Epoch 27, batch 2000, loss[loss=0.1873, ctc_loss=0.1211, cr_loss=0.331, over 17266.00 frames. ], tot_loss[loss=0.2033, ctc_loss=0.1333, cr_loss=0.3498, over 3364710.04 frames. ], batch size: 42, lr: 4.42e-03, grad_scale: 32.0 2024-09-24 10:35:08,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=482099.3333333333, ans=0.0 2024-09-24 10:35:31,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.09 vs. limit=15.0 2024-09-24 10:35:32,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482192.6666666667, ans=0.1 2024-09-24 10:35:46,430 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.048e+02 1.275e+02 1.345e+02 1.450e+02 1.969e+02, threshold=2.691e+02, percent-clipped=0.0 2024-09-24 10:35:50,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=482239.3333333333, ans=0.125 2024-09-24 10:35:53,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=482239.3333333333, ans=0.125 2024-09-24 10:36:04,169 INFO [train.py:1198] (3/4) Epoch 27, batch 2050, loss[loss=0.1612, ctc_loss=0.1012, cr_loss=0.3, over 16325.00 frames. ], tot_loss[loss=0.2036, ctc_loss=0.1336, cr_loss=0.3502, over 3365039.10 frames. ], batch size: 36, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:36:14,059 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:36:23,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=482332.6666666667, ans=0.0 2024-09-24 10:37:03,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=482426.0, ans=0.5 2024-09-24 10:37:16,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=482472.6666666667, ans=0.0 2024-09-24 10:37:27,022 INFO [train.py:1198] (3/4) Epoch 27, batch 2100, loss[loss=0.2323, ctc_loss=0.1569, cr_loss=0.3772, over 17218.00 frames. ], tot_loss[loss=0.2044, ctc_loss=0.1341, cr_loss=0.3517, over 3369185.28 frames. ], batch size: 50, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:37:42,119 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:38:00,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2024-09-24 10:38:02,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=482612.6666666667, ans=0.0 2024-09-24 10:38:21,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=482659.3333333333, ans=0.125 2024-09-24 10:38:29,421 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.227e+02 1.308e+02 1.433e+02 1.787e+02, threshold=2.617e+02, percent-clipped=0.0 2024-09-24 10:38:47,147 INFO [train.py:1198] (3/4) Epoch 27, batch 2150, loss[loss=0.1895, ctc_loss=0.124, cr_loss=0.3276, over 17293.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1346, cr_loss=0.3524, over 3363639.05 frames. ], batch size: 46, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:38:50,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=482752.6666666667, ans=0.2 2024-09-24 10:39:04,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=482799.3333333333, ans=0.125 2024-09-24 10:39:13,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=482799.3333333333, ans=0.0 2024-09-24 10:40:14,296 INFO [train.py:1198] (3/4) Epoch 27, batch 2200, loss[loss=0.2259, ctc_loss=0.1531, cr_loss=0.3642, over 17004.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1344, cr_loss=0.3516, over 3366509.22 frames. ], batch size: 56, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:41:16,370 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.255e+02 1.303e+02 1.378e+02 1.644e+02, threshold=2.606e+02, percent-clipped=0.0 2024-09-24 10:41:16,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483172.6666666667, ans=0.1 2024-09-24 10:41:34,088 INFO [train.py:1198] (3/4) Epoch 27, batch 2250, loss[loss=0.1952, ctc_loss=0.1281, cr_loss=0.3357, over 17315.00 frames. ], tot_loss[loss=0.2035, ctc_loss=0.1334, cr_loss=0.3504, over 3364492.69 frames. ], batch size: 51, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:41:47,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=483219.3333333333, ans=0.125 2024-09-24 10:41:55,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.97 vs. limit=22.5 2024-09-24 10:42:15,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2024-09-24 10:42:24,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-09-24 10:42:56,648 INFO [train.py:1198] (3/4) Epoch 27, batch 2300, loss[loss=0.2065, ctc_loss=0.1364, cr_loss=0.3508, over 17211.00 frames. ], tot_loss[loss=0.2036, ctc_loss=0.1335, cr_loss=0.3505, over 3362616.97 frames. ], batch size: 47, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:43:14,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483499.3333333333, ans=0.1 2024-09-24 10:43:48,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2024-09-24 10:43:58,587 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.254e+02 1.343e+02 1.462e+02 2.563e+02, threshold=2.686e+02, percent-clipped=0.0 2024-09-24 10:44:18,920 INFO [train.py:1198] (3/4) Epoch 27, batch 2350, loss[loss=0.2017, ctc_loss=0.1313, cr_loss=0.3524, over 17367.00 frames. ], tot_loss[loss=0.2038, ctc_loss=0.1336, cr_loss=0.351, over 3368193.85 frames. ], batch size: 48, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:44:39,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=483732.6666666667, ans=0.025 2024-09-24 10:44:43,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=483732.6666666667, ans=0.2 2024-09-24 10:44:53,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=483779.3333333333, ans=0.125 2024-09-24 10:45:32,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483872.6666666667, ans=0.1 2024-09-24 10:45:43,654 INFO [train.py:1198] (3/4) Epoch 27, batch 2400, loss[loss=0.2072, ctc_loss=0.1386, cr_loss=0.3433, over 17006.00 frames. ], tot_loss[loss=0.2025, ctc_loss=0.1326, cr_loss=0.3496, over 3372917.99 frames. ], batch size: 53, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:46:07,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483966.0, ans=0.1 2024-09-24 10:46:26,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=484012.6666666667, ans=0.125 2024-09-24 10:46:45,619 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.266e+02 1.332e+02 1.424e+02 3.115e+02, threshold=2.664e+02, percent-clipped=1.0 2024-09-24 10:47:03,209 INFO [train.py:1198] (3/4) Epoch 27, batch 2450, loss[loss=0.1727, ctc_loss=0.1074, cr_loss=0.3263, over 16941.00 frames. ], tot_loss[loss=0.2017, ctc_loss=0.132, cr_loss=0.3485, over 3377904.10 frames. ], batch size: 42, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:47:11,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=484152.6666666667, ans=0.2 2024-09-24 10:47:12,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484152.6666666667, ans=0.1 2024-09-24 10:47:57,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=484292.6666666667, ans=0.125 2024-09-24 10:48:00,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=484292.6666666667, ans=0.0 2024-09-24 10:48:02,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=484292.6666666667, ans=0.125 2024-09-24 10:48:25,695 INFO [train.py:1198] (3/4) Epoch 27, batch 2500, loss[loss=0.1736, ctc_loss=0.1101, cr_loss=0.3172, over 17089.00 frames. ], tot_loss[loss=0.2015, ctc_loss=0.1318, cr_loss=0.3485, over 3381606.80 frames. ], batch size: 43, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:48:44,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-09-24 10:48:51,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=484432.6666666667, ans=0.2 2024-09-24 10:49:13,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=484479.3333333333, ans=0.025 2024-09-24 10:49:13,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-09-24 10:49:21,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=484526.0, ans=0.95 2024-09-24 10:49:30,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.08 vs. limit=15.0 2024-09-24 10:49:30,486 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.304e+02 1.383e+02 1.470e+02 1.966e+02, threshold=2.767e+02, percent-clipped=0.0 2024-09-24 10:49:47,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=484572.6666666667, ans=0.0 2024-09-24 10:49:50,885 INFO [train.py:1198] (3/4) Epoch 27, batch 2550, loss[loss=0.1845, ctc_loss=0.1214, cr_loss=0.3153, over 17148.00 frames. ], tot_loss[loss=0.2025, ctc_loss=0.1325, cr_loss=0.3499, over 3379122.14 frames. ], batch size: 40, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:50:09,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=484666.0, ans=0.07 2024-09-24 10:51:00,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=15.0 2024-09-24 10:51:13,300 INFO [train.py:1198] (3/4) Epoch 27, batch 2600, loss[loss=0.1702, ctc_loss=0.1073, cr_loss=0.3144, over 17291.00 frames. ], tot_loss[loss=0.2023, ctc_loss=0.1324, cr_loss=0.3494, over 3377266.65 frames. ], batch size: 42, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:51:21,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=484852.6666666667, ans=0.125 2024-09-24 10:51:26,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=484852.6666666667, ans=0.125 2024-09-24 10:51:32,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.32 vs. limit=15.0 2024-09-24 10:51:51,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=12.0 2024-09-24 10:51:52,271 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:51:56,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=484946.0, ans=0.1 2024-09-24 10:51:57,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=484946.0, ans=0.5 2024-09-24 10:52:00,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=484992.6666666667, ans=0.2 2024-09-24 10:52:08,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=484992.6666666667, ans=0.2 2024-09-24 10:52:15,857 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.258e+02 1.335e+02 1.464e+02 4.634e+02, threshold=2.669e+02, percent-clipped=1.0 2024-09-24 10:52:35,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2024-09-24 10:52:36,181 INFO [train.py:1198] (3/4) Epoch 27, batch 2650, loss[loss=0.1919, ctc_loss=0.1235, cr_loss=0.3423, over 17178.00 frames. ], tot_loss[loss=0.2033, ctc_loss=0.1331, cr_loss=0.3508, over 3374369.69 frames. ], batch size: 45, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:53:06,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=485179.3333333333, ans=0.0 2024-09-24 10:53:10,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=485179.3333333333, ans=0.0 2024-09-24 10:53:15,060 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:53:16,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=485179.3333333333, ans=0.2 2024-09-24 10:53:32,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=22.5 2024-09-24 10:53:35,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=485226.0, ans=0.2 2024-09-24 10:53:45,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=485272.6666666667, ans=0.5 2024-09-24 10:53:55,758 INFO [train.py:1198] (3/4) Epoch 27, batch 2700, loss[loss=0.1598, ctc_loss=0.1009, cr_loss=0.2944, over 17252.00 frames. ], tot_loss[loss=0.2041, ctc_loss=0.1338, cr_loss=0.3515, over 3366706.99 frames. ], batch size: 42, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:53:57,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=485319.3333333333, ans=0.1 2024-09-24 10:54:42,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=12.0 2024-09-24 10:54:47,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=485412.6666666667, ans=0.125 2024-09-24 10:55:03,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=485459.3333333333, ans=0.1 2024-09-24 10:55:08,209 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.258e+02 1.322e+02 1.410e+02 2.487e+02, threshold=2.644e+02, percent-clipped=0.0 2024-09-24 10:55:25,587 INFO [train.py:1198] (3/4) Epoch 27, batch 2750, loss[loss=0.2168, ctc_loss=0.1444, cr_loss=0.3616, over 17002.00 frames. ], tot_loss[loss=0.2034, ctc_loss=0.1333, cr_loss=0.3503, over 3364219.75 frames. ], batch size: 53, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:55:30,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=485552.6666666667, ans=0.0 2024-09-24 10:55:32,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=485552.6666666667, ans=0.2 2024-09-24 10:55:57,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=485646.0, ans=0.125 2024-09-24 10:56:02,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=485646.0, ans=0.125 2024-09-24 10:56:05,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=485646.0, ans=0.125 2024-09-24 10:56:35,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=485739.3333333333, ans=0.0 2024-09-24 10:56:45,261 INFO [train.py:1198] (3/4) Epoch 27, batch 2800, loss[loss=0.1749, ctc_loss=0.1147, cr_loss=0.3011, over 17202.00 frames. ], tot_loss[loss=0.2042, ctc_loss=0.134, cr_loss=0.351, over 3346124.41 frames. ], batch size: 41, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:56:52,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=485786.0, ans=0.125 2024-09-24 10:57:08,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=485832.6666666667, ans=0.0 2024-09-24 10:57:17,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=22.5 2024-09-24 10:57:49,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=485926.0, ans=0.025 2024-09-24 10:57:50,318 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.262e+02 1.376e+02 1.500e+02 2.364e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-24 10:58:01,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=485972.6666666667, ans=0.2 2024-09-24 10:58:07,805 INFO [train.py:1198] (3/4) Epoch 27, batch 2850, loss[loss=0.2191, ctc_loss=0.1428, cr_loss=0.3816, over 16865.00 frames. ], tot_loss[loss=0.2046, ctc_loss=0.1343, cr_loss=0.3517, over 3351227.57 frames. ], batch size: 58, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:58:33,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=486066.0, ans=0.1 2024-09-24 10:58:48,122 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:58:49,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=486112.6666666667, ans=0.125 2024-09-24 10:59:14,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=486206.0, ans=0.2 2024-09-24 10:59:16,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=486206.0, ans=0.5 2024-09-24 10:59:28,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=486206.0, ans=0.125 2024-09-24 10:59:32,996 INFO [train.py:1198] (3/4) Epoch 27, batch 2900, loss[loss=0.2235, ctc_loss=0.1512, cr_loss=0.3616, over 17298.00 frames. ], tot_loss[loss=0.2045, ctc_loss=0.1342, cr_loss=0.3517, over 3351071.52 frames. ], batch size: 51, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 10:59:41,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=486252.6666666667, ans=0.125 2024-09-24 10:59:45,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.25 vs. limit=15.0 2024-09-24 10:59:58,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=486299.3333333333, ans=0.125 2024-09-24 11:00:01,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=486299.3333333333, ans=0.0 2024-09-24 11:00:08,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2024-09-24 11:00:34,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2024-09-24 11:00:37,814 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.250e+02 1.339e+02 1.437e+02 2.331e+02, threshold=2.678e+02, percent-clipped=0.0 2024-09-24 11:00:55,810 INFO [train.py:1198] (3/4) Epoch 27, batch 2950, loss[loss=0.1795, ctc_loss=0.1175, cr_loss=0.3101, over 17214.00 frames. ], tot_loss[loss=0.2032, ctc_loss=0.1333, cr_loss=0.3498, over 3348507.82 frames. ], batch size: 41, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:01:16,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=486532.6666666667, ans=0.2 2024-09-24 11:01:18,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=486532.6666666667, ans=0.04949747468305833 2024-09-24 11:02:14,942 INFO [train.py:1198] (3/4) Epoch 27, batch 3000, loss[loss=0.2258, ctc_loss=0.1483, cr_loss=0.3879, over 15807.00 frames. ], tot_loss[loss=0.204, ctc_loss=0.1338, cr_loss=0.3508, over 3351885.05 frames. ], batch size: 74, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:02:14,942 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 11:02:30,459 INFO [train.py:1230] (3/4) Epoch 27, validation: loss=0.03681, ctc_loss=0.03681, cr_loss=8.353e-15, over 944034.00 frames. 2024-09-24 11:02:30,460 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 11:02:46,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=486766.0, ans=0.02 2024-09-24 11:02:48,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=486766.0, ans=0.0 2024-09-24 11:03:10,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=486812.6666666667, ans=0.0 2024-09-24 11:03:11,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=486812.6666666667, ans=0.2 2024-09-24 11:03:19,695 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=2.653e-03 2024-09-24 11:03:22,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=486859.3333333333, ans=0.0 2024-09-24 11:03:32,006 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.263e+02 1.339e+02 1.435e+02 2.051e+02, threshold=2.678e+02, percent-clipped=0.0 2024-09-24 11:03:33,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=486906.0, ans=0.125 2024-09-24 11:03:44,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=486906.0, ans=0.0 2024-09-24 11:03:49,184 INFO [train.py:1198] (3/4) Epoch 27, batch 3050, loss[loss=0.2001, ctc_loss=0.1319, cr_loss=0.341, over 17024.00 frames. ], tot_loss[loss=0.2021, ctc_loss=0.1324, cr_loss=0.3482, over 3352615.55 frames. ], batch size: 56, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:04:00,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=486952.6666666667, ans=0.125 2024-09-24 11:04:03,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=486999.3333333333, ans=0.1 2024-09-24 11:04:14,669 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 11:04:18,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2024-09-24 11:04:20,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=487046.0, ans=0.0 2024-09-24 11:05:00,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=487139.3333333333, ans=0.125 2024-09-24 11:05:05,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=12.0 2024-09-24 11:05:07,555 INFO [train.py:1198] (3/4) Epoch 27, batch 3100, loss[loss=0.2341, ctc_loss=0.1571, cr_loss=0.3851, over 16073.00 frames. ], tot_loss[loss=0.2025, ctc_loss=0.1328, cr_loss=0.3485, over 3357837.05 frames. ], batch size: 74, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:05:17,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=487186.0, ans=0.2 2024-09-24 11:05:22,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=487232.6666666667, ans=0.1 2024-09-24 11:05:23,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=487232.6666666667, ans=0.125 2024-09-24 11:05:25,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=487232.6666666667, ans=0.125 2024-09-24 11:05:31,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=487232.6666666667, ans=0.125 2024-09-24 11:05:39,005 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 11:05:55,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2024-09-24 11:06:11,764 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.251e+02 1.335e+02 1.455e+02 2.261e+02, threshold=2.670e+02, percent-clipped=0.0 2024-09-24 11:06:13,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=487372.6666666667, ans=0.125 2024-09-24 11:06:15,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=487372.6666666667, ans=0.0 2024-09-24 11:06:18,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=487372.6666666667, ans=0.0 2024-09-24 11:06:28,980 INFO [train.py:1198] (3/4) Epoch 27, batch 3150, loss[loss=0.2252, ctc_loss=0.1505, cr_loss=0.3735, over 17354.00 frames. ], tot_loss[loss=0.202, ctc_loss=0.1325, cr_loss=0.3478, over 3361968.68 frames. ], batch size: 48, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:06:40,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=487419.3333333333, ans=0.1 2024-09-24 11:06:58,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=487466.0, ans=0.125 2024-09-24 11:07:01,558 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2024-09-24 11:07:02,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=487512.6666666667, ans=0.0 2024-09-24 11:07:29,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=487559.3333333333, ans=0.1 2024-09-24 11:07:33,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=487559.3333333333, ans=0.125 2024-09-24 11:07:43,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=487606.0, ans=0.0 2024-09-24 11:07:51,273 INFO [train.py:1198] (3/4) Epoch 27, batch 3200, loss[loss=0.1748, ctc_loss=0.1147, cr_loss=0.3006, over 17072.00 frames. ], tot_loss[loss=0.2024, ctc_loss=0.1327, cr_loss=0.3484, over 3362705.31 frames. ], batch size: 43, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:08:07,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=487699.3333333333, ans=0.025 2024-09-24 11:08:16,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=487699.3333333333, ans=0.015 2024-09-24 11:08:27,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=487746.0, ans=0.0 2024-09-24 11:08:40,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=487792.6666666667, ans=0.0 2024-09-24 11:08:52,485 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.262e+02 1.380e+02 1.502e+02 1.892e+02, threshold=2.760e+02, percent-clipped=0.0 2024-09-24 11:09:09,784 INFO [train.py:1198] (3/4) Epoch 27, batch 3250, loss[loss=0.2382, ctc_loss=0.1581, cr_loss=0.4007, over 17008.00 frames. ], tot_loss[loss=0.2025, ctc_loss=0.1328, cr_loss=0.3482, over 3355847.67 frames. ], batch size: 53, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:09:32,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.62 vs. limit=10.0 2024-09-24 11:09:39,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=487979.3333333333, ans=0.1 2024-09-24 11:10:03,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=488026.0, ans=0.0 2024-09-24 11:10:27,843 INFO [train.py:1198] (3/4) Epoch 27, batch 3300, loss[loss=0.1938, ctc_loss=0.1279, cr_loss=0.3292, over 17009.00 frames. ], tot_loss[loss=0.2028, ctc_loss=0.1331, cr_loss=0.3484, over 3359533.78 frames. ], batch size: 51, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:10:28,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.41 vs. limit=15.0 2024-09-24 11:11:05,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=488212.6666666667, ans=0.07 2024-09-24 11:11:08,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=488212.6666666667, ans=0.0 2024-09-24 11:11:18,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=488259.3333333333, ans=0.0 2024-09-24 11:11:24,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=488259.3333333333, ans=0.125 2024-09-24 11:11:24,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=488259.3333333333, ans=0.1 2024-09-24 11:11:28,981 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.251e+02 1.338e+02 1.445e+02 2.480e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-24 11:11:42,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=488306.0, ans=0.1 2024-09-24 11:11:42,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=488306.0, ans=0.125 2024-09-24 11:11:46,445 INFO [train.py:1198] (3/4) Epoch 27, batch 3350, loss[loss=0.1882, ctc_loss=0.1208, cr_loss=0.337, over 17069.00 frames. ], tot_loss[loss=0.2019, ctc_loss=0.1325, cr_loss=0.3471, over 3349458.76 frames. ], batch size: 43, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:12:24,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=488446.0, ans=0.0 2024-09-24 11:12:34,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=488492.6666666667, ans=0.125 2024-09-24 11:12:48,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=488492.6666666667, ans=0.0 2024-09-24 11:12:51,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=488539.3333333333, ans=0.125 2024-09-24 11:13:02,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=488539.3333333333, ans=0.2 2024-09-24 11:13:06,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2024-09-24 11:13:07,097 INFO [train.py:1198] (3/4) Epoch 27, batch 3400, loss[loss=0.199, ctc_loss=0.1327, cr_loss=0.3317, over 17038.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.1331, cr_loss=0.349, over 3357361.08 frames. ], batch size: 56, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:13:11,081 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2024-09-24 11:13:12,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2024-09-24 11:13:23,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=488632.6666666667, ans=0.125 2024-09-24 11:13:41,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=488679.3333333333, ans=0.0 2024-09-24 11:13:52,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=488726.0, ans=0.125 2024-09-24 11:14:08,009 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.290e+02 1.371e+02 1.502e+02 1.850e+02, threshold=2.743e+02, percent-clipped=0.0 2024-09-24 11:14:16,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=488772.6666666667, ans=0.2 2024-09-24 11:14:25,089 INFO [train.py:1198] (3/4) Epoch 27, batch 3450, loss[loss=0.1601, ctc_loss=0.1035, cr_loss=0.2825, over 16314.00 frames. ], tot_loss[loss=0.2037, ctc_loss=0.1337, cr_loss=0.3504, over 3363980.98 frames. ], batch size: 36, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:14:31,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=488819.3333333333, ans=0.0 2024-09-24 11:15:26,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=489006.0, ans=0.09899494936611666 2024-09-24 11:15:26,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=489006.0, ans=0.07 2024-09-24 11:15:34,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489006.0, ans=0.1 2024-09-24 11:15:38,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=489006.0, ans=0.025 2024-09-24 11:15:44,984 INFO [train.py:1198] (3/4) Epoch 27, batch 3500, loss[loss=0.237, ctc_loss=0.1599, cr_loss=0.3857, over 15941.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.1349, cr_loss=0.3528, over 3369985.16 frames. ], batch size: 74, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:15:48,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=489052.6666666667, ans=0.0 2024-09-24 11:15:58,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2024-09-24 11:15:59,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=489099.3333333333, ans=0.125 2024-09-24 11:16:04,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=489099.3333333333, ans=0.1 2024-09-24 11:16:18,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=489146.0, ans=0.125 2024-09-24 11:16:21,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=489146.0, ans=0.125 2024-09-24 11:16:21,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=489146.0, ans=0.2 2024-09-24 11:16:23,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489146.0, ans=0.1 2024-09-24 11:16:38,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=489192.6666666667, ans=0.0 2024-09-24 11:16:46,472 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.271e+02 1.363e+02 1.500e+02 3.531e+02, threshold=2.727e+02, percent-clipped=1.0 2024-09-24 11:16:57,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=489239.3333333333, ans=10.0 2024-09-24 11:17:05,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2024-09-24 11:17:05,742 INFO [train.py:1198] (3/4) Epoch 27, batch 3550, loss[loss=0.1852, ctc_loss=0.1187, cr_loss=0.3324, over 17284.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1346, cr_loss=0.3525, over 3362343.98 frames. ], batch size: 46, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:17:24,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2024-09-24 11:17:57,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=489426.0, ans=0.1 2024-09-24 11:18:15,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=489472.6666666667, ans=0.05 2024-09-24 11:18:22,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=489472.6666666667, ans=0.125 2024-09-24 11:18:25,763 INFO [train.py:1198] (3/4) Epoch 27, batch 3600, loss[loss=0.2064, ctc_loss=0.1401, cr_loss=0.3317, over 15949.00 frames. ], tot_loss[loss=0.204, ctc_loss=0.1337, cr_loss=0.3512, over 3365947.95 frames. ], batch size: 74, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:18:49,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489566.0, ans=0.1 2024-09-24 11:19:02,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=489612.6666666667, ans=0.125 2024-09-24 11:19:02,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=489612.6666666667, ans=0.125 2024-09-24 11:19:17,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=489659.3333333333, ans=0.125 2024-09-24 11:19:26,810 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.270e+02 1.354e+02 1.435e+02 1.974e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-24 11:19:44,360 INFO [train.py:1198] (3/4) Epoch 27, batch 3650, loss[loss=0.1793, ctc_loss=0.1161, cr_loss=0.3156, over 17259.00 frames. ], tot_loss[loss=0.2031, ctc_loss=0.133, cr_loss=0.3505, over 3369608.05 frames. ], batch size: 42, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:19:45,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.80 vs. limit=22.5 2024-09-24 11:20:16,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=489846.0, ans=0.0 2024-09-24 11:20:16,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=489846.0, ans=0.0 2024-09-24 11:20:19,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=489846.0, ans=0.0 2024-09-24 11:20:22,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489846.0, ans=0.1 2024-09-24 11:20:38,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=489892.6666666667, ans=0.0 2024-09-24 11:21:03,533 INFO [train.py:1198] (3/4) Epoch 27, batch 3700, loss[loss=0.1992, ctc_loss=0.1294, cr_loss=0.3494, over 17017.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.1328, cr_loss=0.3501, over 3366431.26 frames. ], batch size: 51, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:21:06,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=489986.0, ans=0.09899494936611666 2024-09-24 11:21:11,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=489986.0, ans=0.125 2024-09-24 11:21:43,466 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.37 vs. limit=15.0 2024-09-24 11:22:05,031 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 1.302e+02 1.450e+02 1.561e+02 3.629e+02, threshold=2.900e+02, percent-clipped=2.0 2024-09-24 11:22:05,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=490172.6666666667, ans=0.125 2024-09-24 11:22:13,355 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-24 11:22:16,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.70 vs. limit=22.5 2024-09-24 11:22:17,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=490172.6666666667, ans=0.0 2024-09-24 11:22:19,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=490172.6666666667, ans=0.1 2024-09-24 11:22:21,922 INFO [train.py:1198] (3/4) Epoch 27, batch 3750, loss[loss=0.1922, ctc_loss=0.126, cr_loss=0.3307, over 17148.00 frames. ], tot_loss[loss=0.2037, ctc_loss=0.1335, cr_loss=0.3507, over 3364943.28 frames. ], batch size: 48, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:22:34,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=490219.3333333333, ans=0.2 2024-09-24 11:22:44,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=490266.0, ans=0.0 2024-09-24 11:22:56,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=490312.6666666667, ans=0.0 2024-09-24 11:23:26,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.28 vs. limit=5.0 2024-09-24 11:23:39,568 INFO [train.py:1198] (3/4) Epoch 27, batch 3800, loss[loss=0.2028, ctc_loss=0.1318, cr_loss=0.3554, over 17006.00 frames. ], tot_loss[loss=0.2042, ctc_loss=0.134, cr_loss=0.3511, over 3346411.76 frames. ], batch size: 39, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:23:53,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2024-09-24 11:24:17,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=490546.0, ans=0.0 2024-09-24 11:24:41,923 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.297e+02 1.423e+02 1.568e+02 2.554e+02, threshold=2.846e+02, percent-clipped=0.0 2024-09-24 11:24:59,152 INFO [train.py:1198] (3/4) Epoch 27, batch 3850, loss[loss=0.2327, ctc_loss=0.1584, cr_loss=0.3718, over 11562.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.1362, cr_loss=0.3537, over 3301045.93 frames. ], batch size: 124, lr: 4.38e-03, grad_scale: 32.0 2024-09-24 11:25:04,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=490686.0, ans=0.0 2024-09-24 11:25:09,619 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2024-09-24 11:25:13,295 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 11:25:34,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=490779.3333333333, ans=0.125 2024-09-24 11:27:00,329 INFO [train.py:1198] (3/4) Epoch 28, batch 0, loss[loss=0.2296, ctc_loss=0.1529, cr_loss=0.3839, over 16981.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1529, cr_loss=0.3839, over 16981.00 frames. ], batch size: 53, lr: 4.30e-03, grad_scale: 32.0 2024-09-24 11:27:00,330 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 11:27:15,942 INFO [train.py:1230] (3/4) Epoch 28, validation: loss=0.03666, ctc_loss=0.03666, cr_loss=9.126e-15, over 944034.00 frames. 2024-09-24 11:27:15,943 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 11:27:41,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=490947.3333333333, ans=0.025 2024-09-24 11:27:59,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=490994.0, ans=0.0 2024-09-24 11:28:13,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=491040.6666666667, ans=0.0 2024-09-24 11:28:27,521 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.316e+02 1.494e+02 1.657e+02 3.455e+02, threshold=2.987e+02, percent-clipped=1.0 2024-09-24 11:28:38,881 INFO [train.py:1198] (3/4) Epoch 28, batch 50, loss[loss=0.1813, ctc_loss=0.1177, cr_loss=0.3178, over 17136.00 frames. ], tot_loss[loss=0.2045, ctc_loss=0.1342, cr_loss=0.3515, over 760940.10 frames. ], batch size: 40, lr: 4.30e-03, grad_scale: 32.0 2024-09-24 11:28:53,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=491180.6666666667, ans=0.125 2024-09-24 11:28:53,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=491180.6666666667, ans=0.125 2024-09-24 11:29:11,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=491227.3333333333, ans=0.125 2024-09-24 11:29:28,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=491274.0, ans=0.125 2024-09-24 11:29:28,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=491274.0, ans=0.04949747468305833 2024-09-24 11:29:30,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=491274.0, ans=0.2 2024-09-24 11:29:49,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=491320.6666666667, ans=0.125 2024-09-24 11:29:58,611 INFO [train.py:1198] (3/4) Epoch 28, batch 100, loss[loss=0.1721, ctc_loss=0.1109, cr_loss=0.3059, over 16751.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1318, cr_loss=0.347, over 1344808.19 frames. ], batch size: 37, lr: 4.30e-03, grad_scale: 32.0 2024-09-24 11:30:16,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=491414.0, ans=0.125 2024-09-24 11:30:22,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=491414.0, ans=0.125 2024-09-24 11:30:22,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=491414.0, ans=0.125 2024-09-24 11:30:24,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=491414.0, ans=0.125 2024-09-24 11:30:36,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=491460.6666666667, ans=0.125 2024-09-24 11:30:44,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491460.6666666667, ans=0.1 2024-09-24 11:30:56,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.15 vs. limit=6.0 2024-09-24 11:31:06,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=491554.0, ans=0.0 2024-09-24 11:31:11,240 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.221e+02 1.303e+02 1.406e+02 2.036e+02, threshold=2.607e+02, percent-clipped=0.0 2024-09-24 11:31:13,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2024-09-24 11:31:20,725 INFO [train.py:1198] (3/4) Epoch 28, batch 150, loss[loss=0.1997, ctc_loss=0.129, cr_loss=0.3536, over 17222.00 frames. ], tot_loss[loss=0.2035, ctc_loss=0.1334, cr_loss=0.3505, over 1788933.29 frames. ], batch size: 47, lr: 4.30e-03, grad_scale: 32.0 2024-09-24 11:31:22,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=491600.6666666667, ans=0.125 2024-09-24 11:31:43,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=491647.3333333333, ans=0.125 2024-09-24 11:31:51,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=22.5 2024-09-24 11:31:55,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=491694.0, ans=0.1 2024-09-24 11:32:16,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491740.6666666667, ans=0.1 2024-09-24 11:32:48,352 INFO [train.py:1198] (3/4) Epoch 28, batch 200, loss[loss=0.1983, ctc_loss=0.1289, cr_loss=0.3471, over 17295.00 frames. ], tot_loss[loss=0.2033, ctc_loss=0.1331, cr_loss=0.3506, over 2142395.68 frames. ], batch size: 51, lr: 4.30e-03, grad_scale: 16.0 2024-09-24 11:32:48,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491834.0, ans=0.1 2024-09-24 11:33:12,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=491880.6666666667, ans=0.0 2024-09-24 11:33:15,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=491880.6666666667, ans=0.0 2024-09-24 11:33:23,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=491927.3333333333, ans=0.2 2024-09-24 11:34:00,088 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.241e+02 1.331e+02 1.443e+02 1.741e+02, threshold=2.662e+02, percent-clipped=0.0 2024-09-24 11:34:03,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=492020.6666666667, ans=0.125 2024-09-24 11:34:08,077 INFO [train.py:1198] (3/4) Epoch 28, batch 250, loss[loss=0.1964, ctc_loss=0.1281, cr_loss=0.3417, over 17102.00 frames. ], tot_loss[loss=0.2035, ctc_loss=0.1333, cr_loss=0.351, over 2410169.95 frames. ], batch size: 49, lr: 4.30e-03, grad_scale: 16.0 2024-09-24 11:34:16,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=492067.3333333333, ans=0.0 2024-09-24 11:34:22,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=492114.0, ans=0.125 2024-09-24 11:34:32,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=492114.0, ans=0.025 2024-09-24 11:35:10,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=492207.3333333333, ans=0.125 2024-09-24 11:35:22,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=492254.0, ans=0.125 2024-09-24 11:35:30,772 INFO [train.py:1198] (3/4) Epoch 28, batch 300, loss[loss=0.1994, ctc_loss=0.1341, cr_loss=0.3265, over 17309.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.132, cr_loss=0.3482, over 2624669.33 frames. ], batch size: 51, lr: 4.30e-03, grad_scale: 16.0 2024-09-24 11:35:46,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=492347.3333333333, ans=0.0 2024-09-24 11:36:04,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492394.0, ans=0.1 2024-09-24 11:36:14,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=492394.0, ans=0.125 2024-09-24 11:36:19,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=492440.6666666667, ans=0.04949747468305833 2024-09-24 11:36:27,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=492440.6666666667, ans=0.035 2024-09-24 11:36:28,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=492440.6666666667, ans=0.025 2024-09-24 11:36:42,613 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.243e+02 1.313e+02 1.459e+02 2.703e+02, threshold=2.626e+02, percent-clipped=1.0 2024-09-24 11:36:49,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=492534.0, ans=0.125 2024-09-24 11:36:50,620 INFO [train.py:1198] (3/4) Epoch 28, batch 350, loss[loss=0.1904, ctc_loss=0.1229, cr_loss=0.3371, over 17295.00 frames. ], tot_loss[loss=0.2021, ctc_loss=0.1324, cr_loss=0.3488, over 2790058.26 frames. ], batch size: 46, lr: 4.30e-03, grad_scale: 16.0 2024-09-24 11:36:50,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=492534.0, ans=0.2 2024-09-24 11:36:54,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=492534.0, ans=0.125 2024-09-24 11:37:13,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=492580.6666666667, ans=0.0 2024-09-24 11:37:14,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492580.6666666667, ans=0.1 2024-09-24 11:37:15,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.26 vs. limit=15.0 2024-09-24 11:37:19,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=492580.6666666667, ans=0.125 2024-09-24 11:37:38,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=492627.3333333333, ans=0.125 2024-09-24 11:37:43,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=492627.3333333333, ans=0.0 2024-09-24 11:37:47,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-09-24 11:38:02,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=492720.6666666667, ans=0.0 2024-09-24 11:38:15,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=492720.6666666667, ans=0.125 2024-09-24 11:38:18,685 INFO [train.py:1198] (3/4) Epoch 28, batch 400, loss[loss=0.2191, ctc_loss=0.1459, cr_loss=0.366, over 16528.00 frames. ], tot_loss[loss=0.2028, ctc_loss=0.1329, cr_loss=0.3495, over 2904457.99 frames. ], batch size: 66, lr: 4.29e-03, grad_scale: 32.0 2024-09-24 11:38:20,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=492767.3333333333, ans=0.125 2024-09-24 11:38:28,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=492767.3333333333, ans=0.125 2024-09-24 11:38:32,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=15.0 2024-09-24 11:38:40,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.75 vs. limit=10.0 2024-09-24 11:38:46,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-09-24 11:39:30,617 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.218e+02 1.284e+02 1.430e+02 3.216e+02, threshold=2.568e+02, percent-clipped=1.0 2024-09-24 11:39:38,503 INFO [train.py:1198] (3/4) Epoch 28, batch 450, loss[loss=0.184, ctc_loss=0.1193, cr_loss=0.3235, over 17085.00 frames. ], tot_loss[loss=0.2018, ctc_loss=0.1321, cr_loss=0.3486, over 3012056.04 frames. ], batch size: 43, lr: 4.29e-03, grad_scale: 32.0 2024-09-24 11:39:49,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=493000.6666666667, ans=0.2 2024-09-24 11:40:27,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=493140.6666666667, ans=0.0 2024-09-24 11:40:33,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=12.0 2024-09-24 11:40:43,801 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 11:40:54,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=493187.3333333333, ans=0.0 2024-09-24 11:41:00,831 INFO [train.py:1198] (3/4) Epoch 28, batch 500, loss[loss=0.176, ctc_loss=0.1144, cr_loss=0.3082, over 17254.00 frames. ], tot_loss[loss=0.2014, ctc_loss=0.1318, cr_loss=0.3479, over 3090430.03 frames. ], batch size: 42, lr: 4.29e-03, grad_scale: 16.0 2024-09-24 11:41:03,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=493234.0, ans=12.0 2024-09-24 11:41:34,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=493327.3333333333, ans=0.0 2024-09-24 11:41:45,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=493327.3333333333, ans=0.125 2024-09-24 11:42:11,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=493420.6666666667, ans=0.1 2024-09-24 11:42:21,662 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.063e+02 1.292e+02 1.373e+02 1.533e+02 1.928e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-24 11:42:23,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=493420.6666666667, ans=0.125 2024-09-24 11:42:27,937 INFO [train.py:1198] (3/4) Epoch 28, batch 550, loss[loss=0.1866, ctc_loss=0.1223, cr_loss=0.3215, over 17282.00 frames. ], tot_loss[loss=0.2015, ctc_loss=0.132, cr_loss=0.3473, over 3143096.79 frames. ], batch size: 46, lr: 4.29e-03, grad_scale: 16.0 2024-09-24 11:43:01,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=493560.6666666667, ans=0.0 2024-09-24 11:43:03,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=493560.6666666667, ans=10.0 2024-09-24 11:43:47,659 INFO [train.py:1198] (3/4) Epoch 28, batch 600, loss[loss=0.2066, ctc_loss=0.136, cr_loss=0.3529, over 17297.00 frames. ], tot_loss[loss=0.2019, ctc_loss=0.1322, cr_loss=0.3484, over 3198803.45 frames. ], batch size: 49, lr: 4.29e-03, grad_scale: 16.0 2024-09-24 11:43:54,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-09-24 11:43:56,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=493700.6666666667, ans=0.1 2024-09-24 11:44:14,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=493747.3333333333, ans=0.09899494936611666 2024-09-24 11:44:42,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=493840.6666666667, ans=0.0 2024-09-24 11:44:43,738 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=15.0 2024-09-24 11:44:48,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=15.0 2024-09-24 11:44:52,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=493887.3333333333, ans=0.125 2024-09-24 11:44:57,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=493887.3333333333, ans=0.09899494936611666 2024-09-24 11:45:04,824 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.264e+02 1.373e+02 1.490e+02 2.458e+02, threshold=2.746e+02, percent-clipped=0.0 2024-09-24 11:45:09,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=493934.0, ans=0.125 2024-09-24 11:45:11,169 INFO [train.py:1198] (3/4) Epoch 28, batch 650, loss[loss=0.2311, ctc_loss=0.1549, cr_loss=0.3812, over 14880.00 frames. ], tot_loss[loss=0.2028, ctc_loss=0.1329, cr_loss=0.3498, over 3237436.34 frames. ], batch size: 89, lr: 4.29e-03, grad_scale: 16.0 2024-09-24 11:45:13,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=493934.0, ans=0.2 2024-09-24 11:46:13,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=494120.6666666667, ans=0.0 2024-09-24 11:46:31,268 INFO [train.py:1198] (3/4) Epoch 28, batch 700, loss[loss=0.2133, ctc_loss=0.1399, cr_loss=0.3671, over 17240.00 frames. ], tot_loss[loss=0.2036, ctc_loss=0.1335, cr_loss=0.3504, over 3256037.18 frames. ], batch size: 55, lr: 4.29e-03, grad_scale: 16.0 2024-09-24 11:46:34,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=494167.3333333333, ans=0.125 2024-09-24 11:46:54,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=494214.0, ans=0.07 2024-09-24 11:47:13,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=22.5 2024-09-24 11:47:24,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2024-09-24 11:47:50,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=494354.0, ans=0.1 2024-09-24 11:47:51,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.01 vs. limit=22.5 2024-09-24 11:47:52,209 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.257e+02 1.354e+02 1.480e+02 2.179e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-24 11:47:52,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=494354.0, ans=0.2 2024-09-24 11:47:58,823 INFO [train.py:1198] (3/4) Epoch 28, batch 750, loss[loss=0.2287, ctc_loss=0.149, cr_loss=0.3985, over 17226.00 frames. ], tot_loss[loss=0.2043, ctc_loss=0.134, cr_loss=0.3515, over 3279666.75 frames. ], batch size: 55, lr: 4.29e-03, grad_scale: 16.0 2024-09-24 11:48:02,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=494400.6666666667, ans=0.0 2024-09-24 11:48:31,136 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 11:48:32,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=494494.0, ans=0.1 2024-09-24 11:49:09,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=494587.3333333333, ans=0.125 2024-09-24 11:49:18,699 INFO [train.py:1198] (3/4) Epoch 28, batch 800, loss[loss=0.2002, ctc_loss=0.1305, cr_loss=0.3485, over 17075.00 frames. ], tot_loss[loss=0.2043, ctc_loss=0.1339, cr_loss=0.3518, over 3297111.96 frames. ], batch size: 43, lr: 4.29e-03, grad_scale: 32.0 2024-09-24 11:49:19,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=494634.0, ans=0.125 2024-09-24 11:49:35,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=494680.6666666667, ans=0.125 2024-09-24 11:49:50,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=494727.3333333333, ans=0.0 2024-09-24 11:49:59,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=494727.3333333333, ans=0.2 2024-09-24 11:50:07,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=494774.0, ans=0.125 2024-09-24 11:50:10,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=494774.0, ans=0.2 2024-09-24 11:50:10,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=494774.0, ans=0.04949747468305833 2024-09-24 11:50:23,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2024-09-24 11:50:26,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=494820.6666666667, ans=0.0 2024-09-24 11:50:33,715 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.239e+02 1.301e+02 1.424e+02 2.595e+02, threshold=2.602e+02, percent-clipped=0.0 2024-09-24 11:50:35,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=494820.6666666667, ans=0.125 2024-09-24 11:50:40,257 INFO [train.py:1198] (3/4) Epoch 28, batch 850, loss[loss=0.1753, ctc_loss=0.1157, cr_loss=0.298, over 17124.00 frames. ], tot_loss[loss=0.2032, ctc_loss=0.1331, cr_loss=0.3502, over 3309641.61 frames. ], batch size: 40, lr: 4.29e-03, grad_scale: 32.0 2024-09-24 11:50:46,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=494867.3333333333, ans=0.0 2024-09-24 11:50:56,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=494914.0, ans=0.125 2024-09-24 11:51:12,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=494960.6666666667, ans=0.04949747468305833 2024-09-24 11:51:21,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=494960.6666666667, ans=10.0 2024-09-24 11:51:45,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495054.0, ans=0.1 2024-09-24 11:51:45,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=495054.0, ans=0.125 2024-09-24 11:52:04,456 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=22.5 2024-09-24 11:52:04,922 INFO [train.py:1198] (3/4) Epoch 28, batch 900, loss[loss=0.2259, ctc_loss=0.1499, cr_loss=0.3804, over 17006.00 frames. ], tot_loss[loss=0.2044, ctc_loss=0.134, cr_loss=0.3522, over 3317334.08 frames. ], batch size: 53, lr: 4.28e-03, grad_scale: 16.0 2024-09-24 11:52:11,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=495100.6666666667, ans=0.125 2024-09-24 11:52:24,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=495147.3333333333, ans=0.0 2024-09-24 11:53:01,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=495240.6666666667, ans=0.0 2024-09-24 11:53:08,227 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2024-09-24 11:53:20,249 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.272e+02 1.349e+02 1.497e+02 2.522e+02, threshold=2.699e+02, percent-clipped=0.0 2024-09-24 11:53:25,044 INFO [train.py:1198] (3/4) Epoch 28, batch 950, loss[loss=0.1606, ctc_loss=0.102, cr_loss=0.2931, over 17104.00 frames. ], tot_loss[loss=0.204, ctc_loss=0.1338, cr_loss=0.3513, over 3326716.69 frames. ], batch size: 40, lr: 4.28e-03, grad_scale: 16.0 2024-09-24 11:53:34,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495334.0, ans=0.1 2024-09-24 11:53:39,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=495380.6666666667, ans=0.125 2024-09-24 11:53:44,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=495380.6666666667, ans=0.125 2024-09-24 11:54:02,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=495427.3333333333, ans=0.125 2024-09-24 11:54:27,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=495520.6666666667, ans=0.2 2024-09-24 11:54:47,631 INFO [train.py:1198] (3/4) Epoch 28, batch 1000, loss[loss=0.2439, ctc_loss=0.1618, cr_loss=0.4107, over 17209.00 frames. ], tot_loss[loss=0.2036, ctc_loss=0.1334, cr_loss=0.351, over 3335277.61 frames. ], batch size: 55, lr: 4.28e-03, grad_scale: 16.0 2024-09-24 11:55:11,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=495614.0, ans=0.04949747468305833 2024-09-24 11:55:18,801 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2024-09-24 11:55:26,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=495660.6666666667, ans=0.125 2024-09-24 11:55:32,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=495660.6666666667, ans=0.0 2024-09-24 11:55:34,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=495707.3333333333, ans=0.125 2024-09-24 11:55:42,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=495707.3333333333, ans=0.0 2024-09-24 11:56:02,812 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.044e+02 1.229e+02 1.298e+02 1.405e+02 1.761e+02, threshold=2.595e+02, percent-clipped=0.0 2024-09-24 11:56:07,661 INFO [train.py:1198] (3/4) Epoch 28, batch 1050, loss[loss=0.2217, ctc_loss=0.1467, cr_loss=0.3753, over 17001.00 frames. ], tot_loss[loss=0.2048, ctc_loss=0.1342, cr_loss=0.3527, over 3337521.54 frames. ], batch size: 53, lr: 4.28e-03, grad_scale: 16.0 2024-09-24 11:56:09,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=495800.6666666667, ans=0.1 2024-09-24 11:56:36,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=495847.3333333333, ans=0.125 2024-09-24 11:57:08,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=495940.6666666667, ans=0.1 2024-09-24 11:57:20,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=495987.3333333333, ans=0.125 2024-09-24 11:57:35,042 INFO [train.py:1198] (3/4) Epoch 28, batch 1100, loss[loss=0.2118, ctc_loss=0.1382, cr_loss=0.3678, over 17024.00 frames. ], tot_loss[loss=0.2045, ctc_loss=0.1341, cr_loss=0.3518, over 3334384.41 frames. ], batch size: 44, lr: 4.28e-03, grad_scale: 16.0 2024-09-24 11:57:57,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=496080.6666666667, ans=0.125 2024-09-24 11:58:03,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=496080.6666666667, ans=0.1 2024-09-24 11:58:19,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=496127.3333333333, ans=0.125 2024-09-24 11:58:50,154 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.267e+02 1.365e+02 1.472e+02 3.756e+02, threshold=2.729e+02, percent-clipped=1.0 2024-09-24 11:58:54,889 INFO [train.py:1198] (3/4) Epoch 28, batch 1150, loss[loss=0.2007, ctc_loss=0.1309, cr_loss=0.3492, over 16873.00 frames. ], tot_loss[loss=0.2037, ctc_loss=0.1336, cr_loss=0.3505, over 3337706.86 frames. ], batch size: 58, lr: 4.28e-03, grad_scale: 16.0 2024-09-24 11:59:00,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=496267.3333333333, ans=0.125 2024-09-24 11:59:00,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=496267.3333333333, ans=0.1 2024-09-24 11:59:06,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=496267.3333333333, ans=0.125 2024-09-24 11:59:08,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=496267.3333333333, ans=0.125 2024-09-24 11:59:33,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=496360.6666666667, ans=0.0 2024-09-24 11:59:50,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=496407.3333333333, ans=0.025 2024-09-24 12:00:01,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=496454.0, ans=0.0 2024-09-24 12:00:06,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=496454.0, ans=0.0 2024-09-24 12:00:11,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=22.5 2024-09-24 12:00:17,104 INFO [train.py:1198] (3/4) Epoch 28, batch 1200, loss[loss=0.2245, ctc_loss=0.1508, cr_loss=0.3685, over 17301.00 frames. ], tot_loss[loss=0.2035, ctc_loss=0.1334, cr_loss=0.3503, over 3340336.82 frames. ], batch size: 49, lr: 4.28e-03, grad_scale: 32.0 2024-09-24 12:00:47,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496594.0, ans=0.1 2024-09-24 12:00:48,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.75 vs. limit=6.0 2024-09-24 12:00:50,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=496594.0, ans=0.025 2024-09-24 12:01:25,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-09-24 12:01:30,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=496687.3333333333, ans=0.05 2024-09-24 12:01:32,307 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.289e+02 1.355e+02 1.454e+02 3.122e+02, threshold=2.710e+02, percent-clipped=1.0 2024-09-24 12:01:37,160 INFO [train.py:1198] (3/4) Epoch 28, batch 1250, loss[loss=0.1828, ctc_loss=0.1186, cr_loss=0.3211, over 17067.00 frames. ], tot_loss[loss=0.2039, ctc_loss=0.1337, cr_loss=0.3507, over 3347703.32 frames. ], batch size: 39, lr: 4.28e-03, grad_scale: 32.0 2024-09-24 12:01:55,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=15.0 2024-09-24 12:01:56,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2024-09-24 12:02:52,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=496920.6666666667, ans=0.0 2024-09-24 12:03:03,733 INFO [train.py:1198] (3/4) Epoch 28, batch 1300, loss[loss=0.2393, ctc_loss=0.1609, cr_loss=0.3921, over 16553.00 frames. ], tot_loss[loss=0.2038, ctc_loss=0.1338, cr_loss=0.3502, over 3347692.18 frames. ], batch size: 66, lr: 4.28e-03, grad_scale: 32.0 2024-09-24 12:03:16,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=496967.3333333333, ans=0.125 2024-09-24 12:03:24,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=497014.0, ans=0.035 2024-09-24 12:03:43,854 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 12:04:18,679 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.268e+02 1.342e+02 1.439e+02 2.491e+02, threshold=2.683e+02, percent-clipped=0.0 2024-09-24 12:04:20,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=497154.0, ans=0.125 2024-09-24 12:04:23,560 INFO [train.py:1198] (3/4) Epoch 28, batch 1350, loss[loss=0.1785, ctc_loss=0.1128, cr_loss=0.3283, over 16968.00 frames. ], tot_loss[loss=0.2033, ctc_loss=0.1335, cr_loss=0.3493, over 3329581.83 frames. ], batch size: 42, lr: 4.28e-03, grad_scale: 32.0 2024-09-24 12:04:27,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=497200.6666666667, ans=0.125 2024-09-24 12:05:13,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=497340.6666666667, ans=0.2 2024-09-24 12:05:41,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=497387.3333333333, ans=0.125 2024-09-24 12:05:45,901 INFO [train.py:1198] (3/4) Epoch 28, batch 1400, loss[loss=0.2327, ctc_loss=0.1546, cr_loss=0.3905, over 15988.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.1331, cr_loss=0.349, over 3335285.80 frames. ], batch size: 74, lr: 4.27e-03, grad_scale: 16.0 2024-09-24 12:05:57,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=497434.0, ans=0.125 2024-09-24 12:06:13,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=497480.6666666667, ans=0.125 2024-09-24 12:06:23,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=497527.3333333333, ans=0.125 2024-09-24 12:06:27,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=497527.3333333333, ans=0.125 2024-09-24 12:06:38,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=497574.0, ans=0.125 2024-09-24 12:07:00,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=497620.6666666667, ans=0.0 2024-09-24 12:07:06,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=497620.6666666667, ans=0.2 2024-09-24 12:07:08,054 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.287e+02 1.409e+02 1.543e+02 2.036e+02, threshold=2.818e+02, percent-clipped=0.0 2024-09-24 12:07:11,200 INFO [train.py:1198] (3/4) Epoch 28, batch 1450, loss[loss=0.2279, ctc_loss=0.147, cr_loss=0.4045, over 16900.00 frames. ], tot_loss[loss=0.2019, ctc_loss=0.1323, cr_loss=0.348, over 3353623.42 frames. ], batch size: 58, lr: 4.27e-03, grad_scale: 16.0 2024-09-24 12:07:32,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=497714.0, ans=0.2 2024-09-24 12:07:33,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=497714.0, ans=0.0 2024-09-24 12:07:45,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.64 vs. limit=12.0 2024-09-24 12:07:59,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=497807.3333333333, ans=0.0 2024-09-24 12:08:18,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=497854.0, ans=0.125 2024-09-24 12:08:31,014 INFO [train.py:1198] (3/4) Epoch 28, batch 1500, loss[loss=0.2065, ctc_loss=0.1382, cr_loss=0.3412, over 15910.00 frames. ], tot_loss[loss=0.2014, ctc_loss=0.1319, cr_loss=0.3476, over 3358261.01 frames. ], batch size: 74, lr: 4.27e-03, grad_scale: 16.0 2024-09-24 12:08:53,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=497947.3333333333, ans=0.1 2024-09-24 12:09:00,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=497947.3333333333, ans=0.0 2024-09-24 12:09:05,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=497994.0, ans=0.025 2024-09-24 12:09:15,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.10 vs. limit=10.0 2024-09-24 12:09:29,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=498040.6666666667, ans=0.0 2024-09-24 12:09:39,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=498087.3333333333, ans=0.09899494936611666 2024-09-24 12:09:41,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=498087.3333333333, ans=0.125 2024-09-24 12:09:50,698 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.264e+02 1.369e+02 1.478e+02 2.054e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-24 12:09:53,890 INFO [train.py:1198] (3/4) Epoch 28, batch 1550, loss[loss=0.236, ctc_loss=0.157, cr_loss=0.3952, over 16582.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.132, cr_loss=0.3478, over 3351324.57 frames. ], batch size: 66, lr: 4.27e-03, grad_scale: 16.0 2024-09-24 12:09:57,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=498134.0, ans=0.0 2024-09-24 12:09:57,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=498134.0, ans=0.125 2024-09-24 12:10:00,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=498134.0, ans=0.025 2024-09-24 12:10:40,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=498274.0, ans=0.04949747468305833 2024-09-24 12:10:40,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=498274.0, ans=0.1 2024-09-24 12:11:13,887 INFO [train.py:1198] (3/4) Epoch 28, batch 1600, loss[loss=0.2369, ctc_loss=0.1557, cr_loss=0.4059, over 17024.00 frames. ], tot_loss[loss=0.2025, ctc_loss=0.1327, cr_loss=0.349, over 3342970.22 frames. ], batch size: 52, lr: 4.27e-03, grad_scale: 32.0 2024-09-24 12:11:40,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=498414.0, ans=0.125 2024-09-24 12:11:56,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=498460.6666666667, ans=0.125 2024-09-24 12:12:11,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=498507.3333333333, ans=0.0 2024-09-24 12:12:38,198 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.239e+02 1.329e+02 1.427e+02 2.041e+02, threshold=2.658e+02, percent-clipped=0.0 2024-09-24 12:12:41,513 INFO [train.py:1198] (3/4) Epoch 28, batch 1650, loss[loss=0.2246, ctc_loss=0.1454, cr_loss=0.3961, over 17011.00 frames. ], tot_loss[loss=0.2037, ctc_loss=0.1335, cr_loss=0.3509, over 3343947.23 frames. ], batch size: 53, lr: 4.27e-03, grad_scale: 32.0 2024-09-24 12:13:08,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=498647.3333333333, ans=0.05 2024-09-24 12:13:25,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=498694.0, ans=15.0 2024-09-24 12:13:26,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=498694.0, ans=0.0 2024-09-24 12:14:00,725 INFO [train.py:1198] (3/4) Epoch 28, batch 1700, loss[loss=0.2386, ctc_loss=0.1602, cr_loss=0.392, over 15914.00 frames. ], tot_loss[loss=0.2031, ctc_loss=0.1331, cr_loss=0.35, over 3335007.84 frames. ], batch size: 74, lr: 4.27e-03, grad_scale: 32.0 2024-09-24 12:14:01,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=498834.0, ans=0.0 2024-09-24 12:14:08,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.43 vs. limit=15.0 2024-09-24 12:14:10,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=498834.0, ans=0.07 2024-09-24 12:14:30,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=498880.6666666667, ans=0.125 2024-09-24 12:14:49,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=498974.0, ans=0.1 2024-09-24 12:14:57,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=498974.0, ans=0.2 2024-09-24 12:14:59,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=498974.0, ans=0.125 2024-09-24 12:15:10,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=499020.6666666667, ans=0.125 2024-09-24 12:15:11,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=499020.6666666667, ans=0.0 2024-09-24 12:15:15,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=499020.6666666667, ans=0.125 2024-09-24 12:15:19,451 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.240e+02 1.314e+02 1.418e+02 1.778e+02, threshold=2.628e+02, percent-clipped=0.0 2024-09-24 12:15:22,617 INFO [train.py:1198] (3/4) Epoch 28, batch 1750, loss[loss=0.1885, ctc_loss=0.1188, cr_loss=0.3484, over 17278.00 frames. ], tot_loss[loss=0.2025, ctc_loss=0.1326, cr_loss=0.3498, over 3342784.03 frames. ], batch size: 42, lr: 4.27e-03, grad_scale: 32.0 2024-09-24 12:15:30,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2024-09-24 12:16:46,338 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 12:16:47,617 INFO [train.py:1198] (3/4) Epoch 28, batch 1800, loss[loss=0.1903, ctc_loss=0.1242, cr_loss=0.3308, over 16934.00 frames. ], tot_loss[loss=0.2031, ctc_loss=0.1331, cr_loss=0.3499, over 3334705.42 frames. ], batch size: 58, lr: 4.27e-03, grad_scale: 32.0 2024-09-24 12:16:49,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=499300.6666666667, ans=0.2 2024-09-24 12:17:03,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=499347.3333333333, ans=0.0 2024-09-24 12:17:48,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=499440.6666666667, ans=0.125 2024-09-24 12:18:03,604 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.272e+02 1.347e+02 1.424e+02 2.124e+02, threshold=2.693e+02, percent-clipped=0.0 2024-09-24 12:18:05,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=499534.0, ans=0.0 2024-09-24 12:18:05,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=499534.0, ans=0.2 2024-09-24 12:18:06,863 INFO [train.py:1198] (3/4) Epoch 28, batch 1850, loss[loss=0.2086, ctc_loss=0.1363, cr_loss=0.3615, over 16762.00 frames. ], tot_loss[loss=0.2037, ctc_loss=0.1334, cr_loss=0.3515, over 3343621.54 frames. ], batch size: 61, lr: 4.27e-03, grad_scale: 32.0 2024-09-24 12:18:12,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=499534.0, ans=0.09899494936611666 2024-09-24 12:18:18,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=499534.0, ans=0.125 2024-09-24 12:18:59,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=499674.0, ans=0.125 2024-09-24 12:19:03,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=499674.0, ans=0.125 2024-09-24 12:19:17,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=499720.6666666667, ans=0.0 2024-09-24 12:19:25,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.30 vs. limit=6.0 2024-09-24 12:19:29,629 INFO [train.py:1198] (3/4) Epoch 28, batch 1900, loss[loss=0.1828, ctc_loss=0.1172, cr_loss=0.3282, over 17303.00 frames. ], tot_loss[loss=0.204, ctc_loss=0.1336, cr_loss=0.3518, over 3341331.40 frames. ], batch size: 51, lr: 4.26e-03, grad_scale: 32.0 2024-09-24 12:19:38,459 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2024-09-24 12:19:43,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=499767.3333333333, ans=0.95 2024-09-24 12:20:18,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=499907.3333333333, ans=0.2 2024-09-24 12:20:32,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=499954.0, ans=0.1 2024-09-24 12:20:46,567 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.256e+02 1.349e+02 1.450e+02 2.234e+02, threshold=2.698e+02, percent-clipped=0.0 2024-09-24 12:20:49,744 INFO [train.py:1198] (3/4) Epoch 28, batch 1950, loss[loss=0.1997, ctc_loss=0.1307, cr_loss=0.3448, over 17309.00 frames. ], tot_loss[loss=0.204, ctc_loss=0.1336, cr_loss=0.3517, over 3345205.87 frames. ], batch size: 51, lr: 4.26e-03, grad_scale: 32.0 2024-09-24 12:20:58,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=500000.6666666667, ans=0.125 2024-09-24 12:21:01,397 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 12:21:05,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=500047.3333333333, ans=0.125 2024-09-24 12:21:20,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=500094.0, ans=0.125 2024-09-24 12:22:17,929 INFO [train.py:1198] (3/4) Epoch 28, batch 2000, loss[loss=0.2132, ctc_loss=0.1381, cr_loss=0.3753, over 17226.00 frames. ], tot_loss[loss=0.2035, ctc_loss=0.1334, cr_loss=0.3509, over 3342828.85 frames. ], batch size: 55, lr: 4.26e-03, grad_scale: 32.0 2024-09-24 12:22:19,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=500234.0, ans=0.2 2024-09-24 12:22:56,081 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2024-09-24 12:23:08,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500374.0, ans=0.1 2024-09-24 12:23:16,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=500374.0, ans=0.0 2024-09-24 12:23:18,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=22.5 2024-09-24 12:23:32,535 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-09-24 12:23:35,004 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.274e+02 1.350e+02 1.459e+02 2.226e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-24 12:23:38,215 INFO [train.py:1198] (3/4) Epoch 28, batch 2050, loss[loss=0.1861, ctc_loss=0.1219, cr_loss=0.3209, over 17144.00 frames. ], tot_loss[loss=0.2026, ctc_loss=0.1328, cr_loss=0.3494, over 3349027.20 frames. ], batch size: 48, lr: 4.26e-03, grad_scale: 32.0 2024-09-24 12:23:48,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=500467.3333333333, ans=0.125 2024-09-24 12:23:52,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=500514.0, ans=0.09899494936611666 2024-09-24 12:24:20,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=500560.6666666667, ans=0.1 2024-09-24 12:24:28,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=500607.3333333333, ans=0.125 2024-09-24 12:24:35,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=500607.3333333333, ans=0.1 2024-09-24 12:24:36,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=500607.3333333333, ans=0.0 2024-09-24 12:24:44,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=500654.0, ans=0.1 2024-09-24 12:24:44,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=500654.0, ans=0.0 2024-09-24 12:24:47,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=500654.0, ans=0.07 2024-09-24 12:24:52,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=500654.0, ans=0.025 2024-09-24 12:25:00,543 INFO [train.py:1198] (3/4) Epoch 28, batch 2100, loss[loss=0.2097, ctc_loss=0.1351, cr_loss=0.373, over 17200.00 frames. ], tot_loss[loss=0.2031, ctc_loss=0.133, cr_loss=0.3504, over 3350221.31 frames. ], batch size: 47, lr: 4.26e-03, grad_scale: 32.0 2024-09-24 12:25:07,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=500700.6666666667, ans=0.0 2024-09-24 12:25:13,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=500700.6666666667, ans=0.125 2024-09-24 12:25:26,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=500747.3333333333, ans=0.0 2024-09-24 12:25:41,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=500794.0, ans=0.125 2024-09-24 12:26:04,688 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.90 vs. limit=6.0 2024-09-24 12:26:06,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.52 vs. limit=15.0 2024-09-24 12:26:16,786 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.293e+02 1.395e+02 1.538e+02 2.200e+02, threshold=2.790e+02, percent-clipped=0.0 2024-09-24 12:26:25,168 INFO [train.py:1198] (3/4) Epoch 28, batch 2150, loss[loss=0.2012, ctc_loss=0.1298, cr_loss=0.3572, over 17280.00 frames. ], tot_loss[loss=0.2027, ctc_loss=0.1328, cr_loss=0.3495, over 3351720.05 frames. ], batch size: 51, lr: 4.26e-03, grad_scale: 32.0 2024-09-24 12:26:30,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=500934.0, ans=0.0 2024-09-24 12:26:53,448 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 12:27:04,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=501027.3333333333, ans=0.0 2024-09-24 12:27:09,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=501027.3333333333, ans=0.0 2024-09-24 12:27:15,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=501074.0, ans=0.0 2024-09-24 12:27:30,692 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=22.5 2024-09-24 12:27:47,666 INFO [train.py:1198] (3/4) Epoch 28, batch 2200, loss[loss=0.1878, ctc_loss=0.1236, cr_loss=0.3208, over 17012.00 frames. ], tot_loss[loss=0.2037, ctc_loss=0.1335, cr_loss=0.3514, over 3358448.78 frames. ], batch size: 44, lr: 4.26e-03, grad_scale: 16.0 2024-09-24 12:27:59,536 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 12:28:04,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501214.0, ans=0.1 2024-09-24 12:28:36,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=501307.3333333333, ans=0.125 2024-09-24 12:28:38,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.40 vs. limit=10.0 2024-09-24 12:29:06,264 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.288e+02 1.410e+02 1.537e+02 2.040e+02, threshold=2.819e+02, percent-clipped=0.0 2024-09-24 12:29:10,555 INFO [train.py:1198] (3/4) Epoch 28, batch 2250, loss[loss=0.1693, ctc_loss=0.1085, cr_loss=0.3038, over 17262.00 frames. ], tot_loss[loss=0.2044, ctc_loss=0.1341, cr_loss=0.3515, over 3352992.96 frames. ], batch size: 42, lr: 4.26e-03, grad_scale: 16.0 2024-09-24 12:29:17,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=501400.6666666667, ans=0.125 2024-09-24 12:29:35,562 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.21 vs. limit=15.0 2024-09-24 12:29:41,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=501494.0, ans=0.025 2024-09-24 12:29:49,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=501494.0, ans=0.025 2024-09-24 12:29:51,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=501494.0, ans=0.0 2024-09-24 12:30:28,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.91 vs. limit=10.0 2024-09-24 12:30:30,752 INFO [train.py:1198] (3/4) Epoch 28, batch 2300, loss[loss=0.2234, ctc_loss=0.1542, cr_loss=0.3462, over 11915.00 frames. ], tot_loss[loss=0.2028, ctc_loss=0.133, cr_loss=0.3494, over 3352827.03 frames. ], batch size: 123, lr: 4.26e-03, grad_scale: 16.0 2024-09-24 12:31:09,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=501727.3333333333, ans=0.125 2024-09-24 12:31:56,847 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.285e+02 1.365e+02 1.487e+02 3.397e+02, threshold=2.730e+02, percent-clipped=1.0 2024-09-24 12:31:58,417 INFO [train.py:1198] (3/4) Epoch 28, batch 2350, loss[loss=0.2033, ctc_loss=0.1326, cr_loss=0.3534, over 17216.00 frames. ], tot_loss[loss=0.2038, ctc_loss=0.1335, cr_loss=0.3513, over 3359555.18 frames. ], batch size: 47, lr: 4.26e-03, grad_scale: 16.0 2024-09-24 12:32:02,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=501867.3333333333, ans=0.1 2024-09-24 12:32:16,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=501914.0, ans=0.025 2024-09-24 12:32:32,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=501960.6666666667, ans=0.0 2024-09-24 12:33:17,994 INFO [train.py:1198] (3/4) Epoch 28, batch 2400, loss[loss=0.2236, ctc_loss=0.1488, cr_loss=0.3739, over 17210.00 frames. ], tot_loss[loss=0.2035, ctc_loss=0.1333, cr_loss=0.351, over 3365481.74 frames. ], batch size: 55, lr: 4.25e-03, grad_scale: 32.0 2024-09-24 12:33:19,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=502100.6666666667, ans=0.1 2024-09-24 12:33:55,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=502194.0, ans=0.125 2024-09-24 12:34:27,881 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.61 vs. limit=10.0 2024-09-24 12:34:31,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=502287.3333333333, ans=0.125 2024-09-24 12:34:39,437 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.025e+02 1.258e+02 1.381e+02 1.475e+02 2.172e+02, threshold=2.761e+02, percent-clipped=0.0 2024-09-24 12:34:41,087 INFO [train.py:1198] (3/4) Epoch 28, batch 2450, loss[loss=0.1813, ctc_loss=0.1165, cr_loss=0.3242, over 17172.00 frames. ], tot_loss[loss=0.2034, ctc_loss=0.1332, cr_loss=0.3512, over 3371673.54 frames. ], batch size: 41, lr: 4.25e-03, grad_scale: 32.0 2024-09-24 12:34:52,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=502334.0, ans=0.125 2024-09-24 12:34:55,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=502380.6666666667, ans=0.0 2024-09-24 12:35:11,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2024-09-24 12:35:48,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=502520.6666666667, ans=0.1 2024-09-24 12:36:00,608 INFO [train.py:1198] (3/4) Epoch 28, batch 2500, loss[loss=0.1911, ctc_loss=0.123, cr_loss=0.3404, over 17168.00 frames. ], tot_loss[loss=0.2039, ctc_loss=0.1336, cr_loss=0.3518, over 3377976.94 frames. ], batch size: 45, lr: 4.25e-03, grad_scale: 16.0 2024-09-24 12:36:09,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.63 vs. limit=22.5 2024-09-24 12:36:18,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=502567.3333333333, ans=0.125 2024-09-24 12:36:27,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.79 vs. limit=12.0 2024-09-24 12:36:33,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=502614.0, ans=0.025 2024-09-24 12:37:19,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=502754.0, ans=0.1 2024-09-24 12:37:28,344 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.269e+02 1.387e+02 1.478e+02 2.068e+02, threshold=2.774e+02, percent-clipped=0.0 2024-09-24 12:37:28,370 INFO [train.py:1198] (3/4) Epoch 28, batch 2550, loss[loss=0.2009, ctc_loss=0.1315, cr_loss=0.347, over 17249.00 frames. ], tot_loss[loss=0.2042, ctc_loss=0.1337, cr_loss=0.3521, over 3380526.75 frames. ], batch size: 50, lr: 4.25e-03, grad_scale: 16.0 2024-09-24 12:37:35,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=502800.6666666667, ans=0.0 2024-09-24 12:37:38,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=502800.6666666667, ans=0.125 2024-09-24 12:37:38,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.81 vs. limit=10.0 2024-09-24 12:37:46,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=502847.3333333333, ans=0.125 2024-09-24 12:38:07,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2024-09-24 12:38:45,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=502987.3333333333, ans=0.125 2024-09-24 12:38:48,321 INFO [train.py:1198] (3/4) Epoch 28, batch 2600, loss[loss=0.2024, ctc_loss=0.1309, cr_loss=0.3572, over 17015.00 frames. ], tot_loss[loss=0.2035, ctc_loss=0.1333, cr_loss=0.3511, over 3380152.44 frames. ], batch size: 44, lr: 4.25e-03, grad_scale: 16.0 2024-09-24 12:38:50,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=503034.0, ans=0.125 2024-09-24 12:39:04,399 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 12:39:12,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=12.0 2024-09-24 12:39:52,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=503174.0, ans=0.0 2024-09-24 12:39:55,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.46 vs. limit=10.0 2024-09-24 12:40:09,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=503267.3333333333, ans=0.125 2024-09-24 12:40:10,913 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.254e+02 1.337e+02 1.417e+02 2.017e+02, threshold=2.675e+02, percent-clipped=0.0 2024-09-24 12:40:10,938 INFO [train.py:1198] (3/4) Epoch 28, batch 2650, loss[loss=0.1691, ctc_loss=0.1095, cr_loss=0.2977, over 17167.00 frames. ], tot_loss[loss=0.203, ctc_loss=0.1329, cr_loss=0.3503, over 3377557.68 frames. ], batch size: 45, lr: 4.25e-03, grad_scale: 16.0 2024-09-24 12:40:22,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=503267.3333333333, ans=0.0 2024-09-24 12:40:37,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=12.0 2024-09-24 12:41:05,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=503407.3333333333, ans=0.05 2024-09-24 12:41:17,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.40 vs. limit=15.0 2024-09-24 12:41:26,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=503454.0, ans=10.0 2024-09-24 12:41:32,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=503454.0, ans=0.1 2024-09-24 12:41:38,427 INFO [train.py:1198] (3/4) Epoch 28, batch 2700, loss[loss=0.1863, ctc_loss=0.1202, cr_loss=0.3308, over 17202.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.1328, cr_loss=0.3504, over 3382990.25 frames. ], batch size: 41, lr: 4.25e-03, grad_scale: 16.0 2024-09-24 12:41:40,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=503500.6666666667, ans=0.125 2024-09-24 12:41:53,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=503547.3333333333, ans=0.2 2024-09-24 12:42:33,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=503640.6666666667, ans=0.125 2024-09-24 12:42:58,695 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.278e+02 1.354e+02 1.486e+02 3.287e+02, threshold=2.708e+02, percent-clipped=2.0 2024-09-24 12:42:58,721 INFO [train.py:1198] (3/4) Epoch 28, batch 2750, loss[loss=0.1679, ctc_loss=0.109, cr_loss=0.2948, over 17162.00 frames. ], tot_loss[loss=0.2034, ctc_loss=0.1332, cr_loss=0.351, over 3370484.48 frames. ], batch size: 41, lr: 4.25e-03, grad_scale: 16.0 2024-09-24 12:43:03,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=503734.0, ans=0.0 2024-09-24 12:43:32,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=503827.3333333333, ans=0.125 2024-09-24 12:43:32,108 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 12:43:35,301 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 12:43:43,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=503827.3333333333, ans=0.07 2024-09-24 12:44:03,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=503920.6666666667, ans=0.125 2024-09-24 12:44:03,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=503920.6666666667, ans=0.2 2024-09-24 12:44:20,599 INFO [train.py:1198] (3/4) Epoch 28, batch 2800, loss[loss=0.205, ctc_loss=0.136, cr_loss=0.3453, over 17035.00 frames. ], tot_loss[loss=0.2042, ctc_loss=0.1339, cr_loss=0.3518, over 3375888.21 frames. ], batch size: 44, lr: 4.25e-03, grad_scale: 32.0 2024-09-24 12:44:36,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503967.3333333333, ans=0.1 2024-09-24 12:44:36,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=503967.3333333333, ans=15.0 2024-09-24 12:44:48,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2024-09-24 12:44:50,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=504014.0, ans=0.025 2024-09-24 12:45:11,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=504107.3333333333, ans=0.0 2024-09-24 12:45:25,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504154.0, ans=0.1 2024-09-24 12:45:37,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=504154.0, ans=0.0 2024-09-24 12:45:43,157 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.267e+02 1.390e+02 1.555e+02 2.170e+02, threshold=2.781e+02, percent-clipped=0.0 2024-09-24 12:45:43,183 INFO [train.py:1198] (3/4) Epoch 28, batch 2850, loss[loss=0.1718, ctc_loss=0.1139, cr_loss=0.2896, over 17235.00 frames. ], tot_loss[loss=0.2034, ctc_loss=0.1333, cr_loss=0.3505, over 3380812.53 frames. ], batch size: 42, lr: 4.25e-03, grad_scale: 32.0 2024-09-24 12:46:25,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=504294.0, ans=0.025 2024-09-24 12:46:45,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=504340.6666666667, ans=0.125 2024-09-24 12:47:00,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=504387.3333333333, ans=0.02 2024-09-24 12:47:10,936 INFO [train.py:1198] (3/4) Epoch 28, batch 2900, loss[loss=0.2313, ctc_loss=0.1525, cr_loss=0.3939, over 16052.00 frames. ], tot_loss[loss=0.2025, ctc_loss=0.1327, cr_loss=0.3491, over 3371455.19 frames. ], batch size: 74, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:47:27,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=504480.6666666667, ans=0.0 2024-09-24 12:48:01,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504574.0, ans=0.1 2024-09-24 12:48:12,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=504574.0, ans=0.0 2024-09-24 12:48:20,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=504620.6666666667, ans=0.0 2024-09-24 12:48:31,272 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.289e+02 1.394e+02 1.570e+02 3.133e+02, threshold=2.787e+02, percent-clipped=1.0 2024-09-24 12:48:31,297 INFO [train.py:1198] (3/4) Epoch 28, batch 2950, loss[loss=0.2397, ctc_loss=0.1576, cr_loss=0.4105, over 17209.00 frames. ], tot_loss[loss=0.2019, ctc_loss=0.1321, cr_loss=0.3487, over 3378189.34 frames. ], batch size: 55, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:48:33,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=504667.3333333333, ans=0.125 2024-09-24 12:48:36,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504667.3333333333, ans=0.1 2024-09-24 12:48:36,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.19 vs. limit=10.0 2024-09-24 12:48:56,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=504714.0, ans=0.125 2024-09-24 12:49:04,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=504760.6666666667, ans=0.1 2024-09-24 12:49:42,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=504854.0, ans=0.0 2024-09-24 12:49:53,087 INFO [train.py:1198] (3/4) Epoch 28, batch 3000, loss[loss=0.2109, ctc_loss=0.1426, cr_loss=0.3416, over 16013.00 frames. ], tot_loss[loss=0.2024, ctc_loss=0.1325, cr_loss=0.3493, over 3369712.18 frames. ], batch size: 74, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:49:53,088 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 12:50:08,094 INFO [train.py:1230] (3/4) Epoch 28, validation: loss=0.03718, ctc_loss=0.03718, cr_loss=8.452e-15, over 944034.00 frames. 2024-09-24 12:50:08,094 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 12:50:11,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=504900.6666666667, ans=0.07 2024-09-24 12:50:27,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=504947.3333333333, ans=0.125 2024-09-24 12:50:41,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=504994.0, ans=0.125 2024-09-24 12:50:50,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=504994.0, ans=0.0 2024-09-24 12:50:51,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=504994.0, ans=0.025 2024-09-24 12:50:52,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=504994.0, ans=0.125 2024-09-24 12:50:58,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=505040.6666666667, ans=0.2 2024-09-24 12:51:06,627 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 12:51:08,240 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 12:51:12,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=505087.3333333333, ans=0.2 2024-09-24 12:51:18,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=505087.3333333333, ans=0.125 2024-09-24 12:51:26,595 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.294e+02 1.362e+02 1.471e+02 2.139e+02, threshold=2.723e+02, percent-clipped=0.0 2024-09-24 12:51:26,620 INFO [train.py:1198] (3/4) Epoch 28, batch 3050, loss[loss=0.2244, ctc_loss=0.1487, cr_loss=0.3785, over 17053.00 frames. ], tot_loss[loss=0.2012, ctc_loss=0.1316, cr_loss=0.3479, over 3368032.86 frames. ], batch size: 52, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:51:41,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.74 vs. limit=15.0 2024-09-24 12:51:42,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=505180.6666666667, ans=0.0 2024-09-24 12:52:09,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=505227.3333333333, ans=15.0 2024-09-24 12:52:10,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=505227.3333333333, ans=0.0 2024-09-24 12:52:22,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=505274.0, ans=0.125 2024-09-24 12:52:41,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=505320.6666666667, ans=0.07 2024-09-24 12:52:47,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=505367.3333333333, ans=0.125 2024-09-24 12:52:48,944 INFO [train.py:1198] (3/4) Epoch 28, batch 3100, loss[loss=0.1675, ctc_loss=0.1073, cr_loss=0.3011, over 17102.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1317, cr_loss=0.3481, over 3367253.56 frames. ], batch size: 43, lr: 4.24e-03, grad_scale: 16.0 2024-09-24 12:52:56,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.87 vs. limit=15.0 2024-09-24 12:53:01,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=505367.3333333333, ans=0.2 2024-09-24 12:53:21,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=505460.6666666667, ans=0.0 2024-09-24 12:53:31,223 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2024-09-24 12:53:44,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=505507.3333333333, ans=0.125 2024-09-24 12:54:09,361 INFO [train.py:1198] (3/4) Epoch 28, batch 3150, loss[loss=0.1953, ctc_loss=0.1268, cr_loss=0.3426, over 17293.00 frames. ], tot_loss[loss=0.2018, ctc_loss=0.1321, cr_loss=0.3484, over 3362291.88 frames. ], batch size: 46, lr: 4.24e-03, grad_scale: 16.0 2024-09-24 12:54:10,885 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.260e+02 1.340e+02 1.441e+02 3.228e+02, threshold=2.680e+02, percent-clipped=2.0 2024-09-24 12:54:17,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=505600.6666666667, ans=0.125 2024-09-24 12:54:54,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=505740.6666666667, ans=0.125 2024-09-24 12:55:12,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=505787.3333333333, ans=0.125 2024-09-24 12:55:16,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=505787.3333333333, ans=0.125 2024-09-24 12:55:18,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=505787.3333333333, ans=0.1 2024-09-24 12:55:21,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=505787.3333333333, ans=0.1 2024-09-24 12:55:27,147 INFO [train.py:1198] (3/4) Epoch 28, batch 3200, loss[loss=0.2113, ctc_loss=0.1418, cr_loss=0.3476, over 15975.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.132, cr_loss=0.3482, over 3364927.55 frames. ], batch size: 74, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:55:31,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=505834.0, ans=0.0 2024-09-24 12:55:36,486 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=5.049e-03 2024-09-24 12:56:13,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=15.0 2024-09-24 12:56:45,234 INFO [train.py:1198] (3/4) Epoch 28, batch 3250, loss[loss=0.239, ctc_loss=0.1587, cr_loss=0.4018, over 17048.00 frames. ], tot_loss[loss=0.2036, ctc_loss=0.1335, cr_loss=0.3509, over 3358841.40 frames. ], batch size: 52, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:56:46,881 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.036e+02 1.261e+02 1.336e+02 1.430e+02 2.422e+02, threshold=2.672e+02, percent-clipped=0.0 2024-09-24 12:57:09,600 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=22.5 2024-09-24 12:57:12,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=506114.0, ans=0.2 2024-09-24 12:57:18,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=506160.6666666667, ans=0.125 2024-09-24 12:57:23,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=506160.6666666667, ans=0.125 2024-09-24 12:57:26,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=506160.6666666667, ans=0.2 2024-09-24 12:57:52,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.38 vs. limit=15.0 2024-09-24 12:57:54,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=506254.0, ans=0.0 2024-09-24 12:58:04,269 INFO [train.py:1198] (3/4) Epoch 28, batch 3300, loss[loss=0.2165, ctc_loss=0.1423, cr_loss=0.3712, over 16699.00 frames. ], tot_loss[loss=0.2032, ctc_loss=0.1331, cr_loss=0.3506, over 3353547.42 frames. ], batch size: 61, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:58:30,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=22.5 2024-09-24 12:58:31,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=506347.3333333333, ans=15.0 2024-09-24 12:58:42,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=506394.0, ans=0.0 2024-09-24 12:58:51,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.66 vs. limit=22.5 2024-09-24 12:59:04,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=506440.6666666667, ans=0.1 2024-09-24 12:59:07,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=506487.3333333333, ans=0.2 2024-09-24 12:59:12,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=506487.3333333333, ans=0.025 2024-09-24 12:59:24,428 INFO [train.py:1198] (3/4) Epoch 28, batch 3350, loss[loss=0.1898, ctc_loss=0.1193, cr_loss=0.3523, over 17099.00 frames. ], tot_loss[loss=0.2032, ctc_loss=0.133, cr_loss=0.3508, over 3353154.66 frames. ], batch size: 40, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:59:24,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=506534.0, ans=0.125 2024-09-24 12:59:25,943 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.246e+02 1.321e+02 1.387e+02 1.674e+02, threshold=2.642e+02, percent-clipped=0.0 2024-09-24 12:59:29,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=506534.0, ans=0.1 2024-09-24 12:59:48,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=506580.6666666667, ans=0.025 2024-09-24 13:00:13,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2024-09-24 13:00:31,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.37 vs. limit=15.0 2024-09-24 13:00:34,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=506720.6666666667, ans=0.1 2024-09-24 13:00:42,381 INFO [train.py:1198] (3/4) Epoch 28, batch 3400, loss[loss=0.169, ctc_loss=0.1089, cr_loss=0.3008, over 17132.00 frames. ], tot_loss[loss=0.2021, ctc_loss=0.1322, cr_loss=0.3495, over 3357263.08 frames. ], batch size: 40, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 13:00:45,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=506767.3333333333, ans=10.0 2024-09-24 13:01:00,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=506814.0, ans=0.125 2024-09-24 13:01:03,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=506814.0, ans=0.125 2024-09-24 13:01:11,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=506814.0, ans=0.0 2024-09-24 13:01:43,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=506954.0, ans=0.125 2024-09-24 13:02:00,541 INFO [train.py:1198] (3/4) Epoch 28, batch 3450, loss[loss=0.1984, ctc_loss=0.1292, cr_loss=0.3458, over 17270.00 frames. ], tot_loss[loss=0.203, ctc_loss=0.1329, cr_loss=0.3504, over 3356551.99 frames. ], batch size: 44, lr: 4.23e-03, grad_scale: 32.0 2024-09-24 13:02:02,018 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.326e+02 1.438e+02 1.586e+02 2.934e+02, threshold=2.877e+02, percent-clipped=1.0 2024-09-24 13:02:04,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=15.0 2024-09-24 13:02:29,554 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-24 13:02:38,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=507094.0, ans=0.2 2024-09-24 13:02:40,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=22.5 2024-09-24 13:02:46,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=507140.6666666667, ans=0.2 2024-09-24 13:02:53,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=507140.6666666667, ans=0.125 2024-09-24 13:03:07,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=507187.3333333333, ans=0.125 2024-09-24 13:03:25,066 INFO [train.py:1198] (3/4) Epoch 28, batch 3500, loss[loss=0.2345, ctc_loss=0.1539, cr_loss=0.4031, over 17217.00 frames. ], tot_loss[loss=0.2018, ctc_loss=0.132, cr_loss=0.3493, over 3354666.25 frames. ], batch size: 55, lr: 4.23e-03, grad_scale: 32.0 2024-09-24 13:03:34,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=507234.0, ans=0.2 2024-09-24 13:03:36,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=507234.0, ans=0.125 2024-09-24 13:03:39,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=507280.6666666667, ans=0.0 2024-09-24 13:03:47,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=507280.6666666667, ans=0.125 2024-09-24 13:03:53,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=507280.6666666667, ans=0.125 2024-09-24 13:04:26,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=507420.6666666667, ans=0.0 2024-09-24 13:04:36,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=507420.6666666667, ans=0.0 2024-09-24 13:04:42,956 INFO [train.py:1198] (3/4) Epoch 28, batch 3550, loss[loss=0.2163, ctc_loss=0.1415, cr_loss=0.3739, over 15964.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1313, cr_loss=0.3481, over 3362588.68 frames. ], batch size: 74, lr: 4.23e-03, grad_scale: 32.0 2024-09-24 13:04:44,467 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.292e+02 1.411e+02 1.555e+02 1.879e+02, threshold=2.822e+02, percent-clipped=0.0 2024-09-24 13:04:47,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=507467.3333333333, ans=0.0 2024-09-24 13:04:53,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-09-24 13:04:56,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2024-09-24 13:05:03,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=507514.0, ans=0.1 2024-09-24 13:05:30,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.83 vs. limit=22.5 2024-09-24 13:05:59,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=507700.6666666667, ans=0.09899494936611666 2024-09-24 13:06:00,781 INFO [train.py:1198] (3/4) Epoch 28, batch 3600, loss[loss=0.205, ctc_loss=0.1342, cr_loss=0.3537, over 17227.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.1309, cr_loss=0.3474, over 3364429.28 frames. ], batch size: 47, lr: 4.23e-03, grad_scale: 32.0 2024-09-24 13:06:10,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=507700.6666666667, ans=0.0 2024-09-24 13:06:14,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0 2024-09-24 13:06:24,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=507747.3333333333, ans=0.2 2024-09-24 13:06:26,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=507747.3333333333, ans=0.1 2024-09-24 13:06:40,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=507794.0, ans=0.1 2024-09-24 13:06:40,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=507794.0, ans=0.0 2024-09-24 13:06:57,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=507840.6666666667, ans=0.125 2024-09-24 13:07:07,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=507887.3333333333, ans=0.125 2024-09-24 13:07:18,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=12.0 2024-09-24 13:07:19,237 INFO [train.py:1198] (3/4) Epoch 28, batch 3650, loss[loss=0.1873, ctc_loss=0.1257, cr_loss=0.3078, over 17207.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1313, cr_loss=0.3479, over 3357949.41 frames. ], batch size: 47, lr: 4.23e-03, grad_scale: 32.0 2024-09-24 13:07:20,721 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.240e+02 1.329e+02 1.473e+02 2.274e+02, threshold=2.658e+02, percent-clipped=0.0 2024-09-24 13:07:27,203 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 13:07:32,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=507934.0, ans=0.025 2024-09-24 13:07:59,367 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=22.5 2024-09-24 13:08:37,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=508120.6666666667, ans=0.125 2024-09-24 13:08:38,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=508167.3333333333, ans=0.125 2024-09-24 13:08:40,143 INFO [train.py:1198] (3/4) Epoch 28, batch 3700, loss[loss=0.2478, ctc_loss=0.1652, cr_loss=0.4134, over 17049.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1317, cr_loss=0.3479, over 3346879.23 frames. ], batch size: 52, lr: 4.23e-03, grad_scale: 32.0 2024-09-24 13:08:56,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=508214.0, ans=0.125 2024-09-24 13:08:56,838 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.41 vs. limit=12.0 2024-09-24 13:08:57,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=508214.0, ans=0.0 2024-09-24 13:08:59,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=508214.0, ans=0.2 2024-09-24 13:09:37,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=15.0 2024-09-24 13:09:38,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=508307.3333333333, ans=0.0 2024-09-24 13:09:46,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=508354.0, ans=0.035 2024-09-24 13:09:49,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=508354.0, ans=0.025 2024-09-24 13:09:58,946 INFO [train.py:1198] (3/4) Epoch 28, batch 3750, loss[loss=0.2353, ctc_loss=0.1572, cr_loss=0.3905, over 15247.00 frames. ], tot_loss[loss=0.2034, ctc_loss=0.1333, cr_loss=0.3504, over 3334647.32 frames. ], batch size: 89, lr: 4.23e-03, grad_scale: 32.0 2024-09-24 13:10:00,434 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.248e+02 1.336e+02 1.422e+02 1.841e+02, threshold=2.673e+02, percent-clipped=0.0 2024-09-24 13:10:11,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=508400.6666666667, ans=0.0 2024-09-24 13:10:33,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=508494.0, ans=0.125 2024-09-24 13:10:35,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=508494.0, ans=0.125 2024-09-24 13:10:36,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=508494.0, ans=0.125 2024-09-24 13:11:03,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=508587.3333333333, ans=0.1 2024-09-24 13:11:16,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=508634.0, ans=0.0 2024-09-24 13:11:17,815 INFO [train.py:1198] (3/4) Epoch 28, batch 3800, loss[loss=0.1925, ctc_loss=0.1239, cr_loss=0.3429, over 17167.00 frames. ], tot_loss[loss=0.2042, ctc_loss=0.134, cr_loss=0.3509, over 3318990.79 frames. ], batch size: 41, lr: 4.23e-03, grad_scale: 16.0 2024-09-24 13:11:40,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=508680.6666666667, ans=0.125 2024-09-24 13:11:41,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=508680.6666666667, ans=0.1 2024-09-24 13:11:51,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=508727.3333333333, ans=0.125 2024-09-24 13:12:24,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=508820.6666666667, ans=0.125 2024-09-24 13:12:37,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=508867.3333333333, ans=0.125 2024-09-24 13:12:38,538 INFO [train.py:1198] (3/4) Epoch 28, batch 3850, loss[loss=0.2398, ctc_loss=0.1666, cr_loss=0.366, over 11932.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1348, cr_loss=0.3514, over 3278240.76 frames. ], batch size: 123, lr: 4.23e-03, grad_scale: 16.0 2024-09-24 13:12:42,159 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.266e+02 1.354e+02 1.491e+02 2.044e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-24 13:12:44,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=508867.3333333333, ans=0.125 2024-09-24 13:12:45,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=508867.3333333333, ans=0.125 2024-09-24 13:12:51,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=508867.3333333333, ans=0.125 2024-09-24 13:12:56,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=508914.0, ans=0.2 2024-09-24 13:13:18,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-09-24 13:13:42,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=509054.0, ans=0.125 2024-09-24 13:14:43,488 INFO [train.py:1198] (3/4) Epoch 29, batch 0, loss[loss=0.1955, ctc_loss=0.127, cr_loss=0.3426, over 17209.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.127, cr_loss=0.3426, over 17209.00 frames. ], batch size: 47, lr: 4.15e-03, grad_scale: 32.0 2024-09-24 13:14:43,489 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 13:14:51,315 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.5717, 5.3220, 4.9915, 5.1376], device='cuda:3') 2024-09-24 13:14:58,951 INFO [train.py:1230] (3/4) Epoch 29, validation: loss=0.03615, ctc_loss=0.03615, cr_loss=9.405e-15, over 944034.00 frames. 2024-09-24 13:14:58,952 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 13:15:33,653 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2024-09-24 13:15:38,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=509175.3333333333, ans=0.0 2024-09-24 13:15:40,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-09-24 13:16:21,072 INFO [train.py:1198] (3/4) Epoch 29, batch 50, loss[loss=0.2114, ctc_loss=0.1377, cr_loss=0.3687, over 17177.00 frames. ], tot_loss[loss=0.2, ctc_loss=0.1308, cr_loss=0.3462, over 758802.41 frames. ], batch size: 55, lr: 4.15e-03, grad_scale: 32.0 2024-09-24 13:16:24,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=509315.3333333333, ans=0.025 2024-09-24 13:16:30,903 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.316e+02 1.466e+02 1.618e+02 2.901e+02, threshold=2.933e+02, percent-clipped=1.0 2024-09-24 13:16:32,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=509315.3333333333, ans=0.025 2024-09-24 13:16:37,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=509362.0, ans=0.025 2024-09-24 13:16:42,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=509362.0, ans=0.0 2024-09-24 13:16:42,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=509362.0, ans=0.125 2024-09-24 13:17:17,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=509455.3333333333, ans=0.2 2024-09-24 13:17:43,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=509548.6666666667, ans=0.0 2024-09-24 13:17:44,486 INFO [train.py:1198] (3/4) Epoch 29, batch 100, loss[loss=0.1788, ctc_loss=0.1152, cr_loss=0.3178, over 17236.00 frames. ], tot_loss[loss=0.2027, ctc_loss=0.1325, cr_loss=0.3507, over 1345712.72 frames. ], batch size: 47, lr: 4.15e-03, grad_scale: 32.0 2024-09-24 13:18:04,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=509595.3333333333, ans=0.0 2024-09-24 13:18:23,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.68 vs. limit=15.0 2024-09-24 13:18:24,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=509642.0, ans=0.125 2024-09-24 13:18:30,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=509642.0, ans=0.125 2024-09-24 13:18:51,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=509735.3333333333, ans=0.125 2024-09-24 13:19:09,197 INFO [train.py:1198] (3/4) Epoch 29, batch 150, loss[loss=0.2353, ctc_loss=0.1564, cr_loss=0.3947, over 17228.00 frames. ], tot_loss[loss=0.2037, ctc_loss=0.1332, cr_loss=0.3525, over 1794520.98 frames. ], batch size: 50, lr: 4.15e-03, grad_scale: 32.0 2024-09-24 13:19:18,661 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.009e+02 1.258e+02 1.349e+02 1.479e+02 2.103e+02, threshold=2.697e+02, percent-clipped=0.0 2024-09-24 13:19:44,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=509875.3333333333, ans=0.0 2024-09-24 13:19:47,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.29 vs. limit=6.0 2024-09-24 13:20:01,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=509922.0, ans=0.1 2024-09-24 13:20:23,291 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.81 vs. limit=10.0 2024-09-24 13:20:30,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=510015.3333333333, ans=0.0 2024-09-24 13:20:31,476 INFO [train.py:1198] (3/4) Epoch 29, batch 200, loss[loss=0.1708, ctc_loss=0.1086, cr_loss=0.3108, over 17084.00 frames. ], tot_loss[loss=0.2014, ctc_loss=0.1315, cr_loss=0.3491, over 2136228.81 frames. ], batch size: 40, lr: 4.15e-03, grad_scale: 32.0 2024-09-24 13:20:52,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=510062.0, ans=0.0 2024-09-24 13:21:03,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=510108.6666666667, ans=0.0 2024-09-24 13:21:06,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=510108.6666666667, ans=0.125 2024-09-24 13:21:17,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=510155.3333333333, ans=0.0 2024-09-24 13:21:49,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=510248.6666666667, ans=0.125 2024-09-24 13:21:51,166 INFO [train.py:1198] (3/4) Epoch 29, batch 250, loss[loss=0.2185, ctc_loss=0.1418, cr_loss=0.3835, over 17017.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1316, cr_loss=0.3486, over 2411654.07 frames. ], batch size: 51, lr: 4.15e-03, grad_scale: 32.0 2024-09-24 13:21:53,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2024-09-24 13:22:00,734 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.248e+02 1.376e+02 1.477e+02 2.189e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-24 13:22:07,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=510295.3333333333, ans=0.125 2024-09-24 13:22:09,414 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.21 vs. limit=15.0 2024-09-24 13:22:20,483 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=12.0 2024-09-24 13:22:26,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=510342.0, ans=0.125 2024-09-24 13:23:09,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=510435.3333333333, ans=0.0 2024-09-24 13:23:16,528 INFO [train.py:1198] (3/4) Epoch 29, batch 300, loss[loss=0.1934, ctc_loss=0.1276, cr_loss=0.3292, over 17156.00 frames. ], tot_loss[loss=0.2017, ctc_loss=0.132, cr_loss=0.3486, over 2624133.25 frames. ], batch size: 48, lr: 4.14e-03, grad_scale: 16.0 2024-09-24 13:23:24,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=510482.0, ans=0.125 2024-09-24 13:23:26,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=510482.0, ans=0.2 2024-09-24 13:23:49,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=510575.3333333333, ans=0.0 2024-09-24 13:24:05,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.68 vs. limit=12.0 2024-09-24 13:24:07,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=510622.0, ans=0.125 2024-09-24 13:24:18,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=510622.0, ans=0.1 2024-09-24 13:24:19,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=510622.0, ans=0.125 2024-09-24 13:24:23,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=510668.6666666667, ans=0.1 2024-09-24 13:24:29,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=510668.6666666667, ans=0.0 2024-09-24 13:24:39,072 INFO [train.py:1198] (3/4) Epoch 29, batch 350, loss[loss=0.2049, ctc_loss=0.1337, cr_loss=0.3555, over 17005.00 frames. ], tot_loss[loss=0.2023, ctc_loss=0.1325, cr_loss=0.3493, over 2780494.47 frames. ], batch size: 56, lr: 4.14e-03, grad_scale: 16.0 2024-09-24 13:24:47,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.72 vs. limit=15.0 2024-09-24 13:24:48,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=510715.3333333333, ans=0.1 2024-09-24 13:24:50,236 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.267e+02 1.359e+02 1.490e+02 2.133e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-24 13:25:42,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=510855.3333333333, ans=0.0 2024-09-24 13:25:44,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2024-09-24 13:25:49,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=510902.0, ans=15.0 2024-09-24 13:25:55,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=510902.0, ans=0.2 2024-09-24 13:26:01,409 INFO [train.py:1198] (3/4) Epoch 29, batch 400, loss[loss=0.2139, ctc_loss=0.1398, cr_loss=0.3702, over 17200.00 frames. ], tot_loss[loss=0.2017, ctc_loss=0.132, cr_loss=0.3485, over 2914632.32 frames. ], batch size: 55, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:26:11,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.79 vs. limit=15.0 2024-09-24 13:26:36,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=511042.0, ans=0.0 2024-09-24 13:26:40,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=511042.0, ans=0.025 2024-09-24 13:26:41,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=511042.0, ans=0.0 2024-09-24 13:27:16,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=511135.3333333333, ans=0.125 2024-09-24 13:27:18,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=511135.3333333333, ans=0.125 2024-09-24 13:27:21,140 INFO [train.py:1198] (3/4) Epoch 29, batch 450, loss[loss=0.2019, ctc_loss=0.1334, cr_loss=0.3429, over 17355.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.131, cr_loss=0.3467, over 3020383.65 frames. ], batch size: 48, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:27:35,218 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.269e+02 1.351e+02 1.464e+02 1.902e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-24 13:28:24,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=511322.0, ans=0.0 2024-09-24 13:28:31,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.26 vs. limit=15.0 2024-09-24 13:28:46,770 INFO [train.py:1198] (3/4) Epoch 29, batch 500, loss[loss=0.2096, ctc_loss=0.137, cr_loss=0.3627, over 17163.00 frames. ], tot_loss[loss=0.2002, ctc_loss=0.1309, cr_loss=0.3466, over 3102298.32 frames. ], batch size: 45, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:29:42,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-09-24 13:29:55,842 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2024-09-24 13:29:56,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=511602.0, ans=0.125 2024-09-24 13:29:56,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=511602.0, ans=0.1 2024-09-24 13:30:07,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=511648.6666666667, ans=0.1 2024-09-24 13:30:09,201 INFO [train.py:1198] (3/4) Epoch 29, batch 550, loss[loss=0.2226, ctc_loss=0.1444, cr_loss=0.3907, over 17196.00 frames. ], tot_loss[loss=0.2011, ctc_loss=0.1315, cr_loss=0.348, over 3159697.63 frames. ], batch size: 55, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:30:20,287 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.249e+02 1.312e+02 1.438e+02 1.848e+02, threshold=2.623e+02, percent-clipped=0.0 2024-09-24 13:30:57,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=511788.6666666667, ans=0.0 2024-09-24 13:30:59,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=511788.6666666667, ans=0.125 2024-09-24 13:31:00,377 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.83 vs. limit=10.0 2024-09-24 13:31:06,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2024-09-24 13:31:30,728 INFO [train.py:1198] (3/4) Epoch 29, batch 600, loss[loss=0.2069, ctc_loss=0.1375, cr_loss=0.347, over 17204.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1306, cr_loss=0.3464, over 3207187.74 frames. ], batch size: 55, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:31:45,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=511928.6666666667, ans=0.0 2024-09-24 13:31:50,433 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=15.0 2024-09-24 13:32:35,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=512068.6666666667, ans=0.1 2024-09-24 13:32:45,910 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2024-09-24 13:32:50,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=512068.6666666667, ans=0.0 2024-09-24 13:32:53,244 INFO [train.py:1198] (3/4) Epoch 29, batch 650, loss[loss=0.2036, ctc_loss=0.1309, cr_loss=0.3633, over 17003.00 frames. ], tot_loss[loss=0.201, ctc_loss=0.1314, cr_loss=0.3479, over 3242616.49 frames. ], batch size: 44, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:32:59,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=512115.3333333333, ans=0.0 2024-09-24 13:33:04,450 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.266e+02 1.354e+02 1.448e+02 2.374e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-24 13:33:07,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=512162.0, ans=0.125 2024-09-24 13:33:25,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=512162.0, ans=0.0 2024-09-24 13:33:27,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=512208.6666666667, ans=0.125 2024-09-24 13:33:45,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=512255.3333333333, ans=0.2 2024-09-24 13:33:54,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-09-24 13:34:00,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=512255.3333333333, ans=0.1 2024-09-24 13:34:13,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=512302.0, ans=0.125 2024-09-24 13:34:19,041 INFO [train.py:1198] (3/4) Epoch 29, batch 700, loss[loss=0.1509, ctc_loss=0.09569, cr_loss=0.2761, over 17043.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1316, cr_loss=0.3482, over 3276458.30 frames. ], batch size: 39, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:34:27,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=512348.6666666667, ans=0.2 2024-09-24 13:34:44,839 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 13:34:59,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=22.5 2024-09-24 13:35:11,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=512488.6666666667, ans=0.125 2024-09-24 13:35:15,586 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=22.5 2024-09-24 13:35:41,601 INFO [train.py:1198] (3/4) Epoch 29, batch 750, loss[loss=0.1607, ctc_loss=0.103, cr_loss=0.2883, over 17253.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.131, cr_loss=0.3472, over 3297035.11 frames. ], batch size: 42, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:35:52,785 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.251e+02 1.333e+02 1.428e+02 1.733e+02, threshold=2.666e+02, percent-clipped=0.0 2024-09-24 13:35:56,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=512628.6666666667, ans=0.2 2024-09-24 13:36:10,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=512628.6666666667, ans=0.125 2024-09-24 13:37:01,405 INFO [train.py:1198] (3/4) Epoch 29, batch 800, loss[loss=0.2, ctc_loss=0.1328, cr_loss=0.3359, over 17311.00 frames. ], tot_loss[loss=0.2002, ctc_loss=0.1309, cr_loss=0.3469, over 3320515.61 frames. ], batch size: 51, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:37:01,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=512815.3333333333, ans=0.0 2024-09-24 13:37:01,780 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 13:37:03,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=512815.3333333333, ans=0.125 2024-09-24 13:37:04,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=512815.3333333333, ans=0.125 2024-09-24 13:37:18,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=512862.0, ans=0.2 2024-09-24 13:37:49,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=512908.6666666667, ans=0.2 2024-09-24 13:37:56,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=512955.3333333333, ans=0.125 2024-09-24 13:37:56,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=512955.3333333333, ans=0.125 2024-09-24 13:38:21,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=513002.0, ans=0.2 2024-09-24 13:38:27,138 INFO [train.py:1198] (3/4) Epoch 29, batch 850, loss[loss=0.2128, ctc_loss=0.1412, cr_loss=0.3576, over 17037.00 frames. ], tot_loss[loss=0.2011, ctc_loss=0.1315, cr_loss=0.3477, over 3323218.92 frames. ], batch size: 52, lr: 4.13e-03, grad_scale: 32.0 2024-09-24 13:38:38,340 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.286e+02 1.364e+02 1.497e+02 3.898e+02, threshold=2.729e+02, percent-clipped=1.0 2024-09-24 13:39:11,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=513142.0, ans=0.0 2024-09-24 13:39:14,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=513142.0, ans=0.025 2024-09-24 13:39:24,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=67.43 vs. limit=15.0 2024-09-24 13:39:26,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2024-09-24 13:39:39,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=513235.3333333333, ans=0.125 2024-09-24 13:39:45,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=12.0 2024-09-24 13:39:49,256 INFO [train.py:1198] (3/4) Epoch 29, batch 900, loss[loss=0.2175, ctc_loss=0.143, cr_loss=0.3724, over 17064.00 frames. ], tot_loss[loss=0.2022, ctc_loss=0.1322, cr_loss=0.3496, over 3335673.56 frames. ], batch size: 46, lr: 4.13e-03, grad_scale: 32.0 2024-09-24 13:39:52,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=513282.0, ans=0.125 2024-09-24 13:40:24,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=513375.3333333333, ans=0.0 2024-09-24 13:40:32,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=513375.3333333333, ans=0.09899494936611666 2024-09-24 13:40:34,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=513375.3333333333, ans=0.125 2024-09-24 13:40:37,595 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.506e-03 2024-09-24 13:40:53,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=513422.0, ans=0.0 2024-09-24 13:41:02,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.36 vs. limit=12.0 2024-09-24 13:41:12,251 INFO [train.py:1198] (3/4) Epoch 29, batch 950, loss[loss=0.1875, ctc_loss=0.1232, cr_loss=0.3212, over 17064.00 frames. ], tot_loss[loss=0.2017, ctc_loss=0.1319, cr_loss=0.3494, over 3352114.74 frames. ], batch size: 46, lr: 4.13e-03, grad_scale: 32.0 2024-09-24 13:41:23,508 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.274e+02 1.388e+02 1.487e+02 2.628e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-24 13:42:18,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513702.0, ans=0.1 2024-09-24 13:42:18,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=513702.0, ans=0.2 2024-09-24 13:42:35,067 INFO [train.py:1198] (3/4) Epoch 29, batch 1000, loss[loss=0.1893, ctc_loss=0.1227, cr_loss=0.3331, over 17152.00 frames. ], tot_loss[loss=0.202, ctc_loss=0.1321, cr_loss=0.3495, over 3354923.51 frames. ], batch size: 45, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:43:02,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=513795.3333333333, ans=0.125 2024-09-24 13:43:59,530 INFO [train.py:1198] (3/4) Epoch 29, batch 1050, loss[loss=0.1791, ctc_loss=0.1163, cr_loss=0.3135, over 17070.00 frames. ], tot_loss[loss=0.2018, ctc_loss=0.1318, cr_loss=0.3496, over 3365228.80 frames. ], batch size: 43, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:44:06,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2024-09-24 13:44:07,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=513982.0, ans=0.125 2024-09-24 13:44:12,020 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.285e+02 1.349e+02 1.436e+02 3.120e+02, threshold=2.698e+02, percent-clipped=1.0 2024-09-24 13:44:56,294 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=22.5 2024-09-24 13:45:12,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=514168.6666666667, ans=0.0 2024-09-24 13:45:15,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=514168.6666666667, ans=0.1 2024-09-24 13:45:21,728 INFO [train.py:1198] (3/4) Epoch 29, batch 1100, loss[loss=0.1829, ctc_loss=0.1181, cr_loss=0.3243, over 17040.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1313, cr_loss=0.348, over 3369281.55 frames. ], batch size: 51, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:45:36,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=514262.0, ans=0.0 2024-09-24 13:45:52,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=514308.6666666667, ans=0.04949747468305833 2024-09-24 13:46:07,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2024-09-24 13:46:23,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=514355.3333333333, ans=0.025 2024-09-24 13:46:42,100 INFO [train.py:1198] (3/4) Epoch 29, batch 1150, loss[loss=0.2083, ctc_loss=0.1343, cr_loss=0.3698, over 17102.00 frames. ], tot_loss[loss=0.2001, ctc_loss=0.1307, cr_loss=0.347, over 3370821.99 frames. ], batch size: 43, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:46:54,883 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.257e+02 1.322e+02 1.401e+02 2.069e+02, threshold=2.644e+02, percent-clipped=0.0 2024-09-24 13:47:07,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=514495.3333333333, ans=0.0 2024-09-24 13:47:23,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=514542.0, ans=0.125 2024-09-24 13:47:53,291 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 13:48:04,034 INFO [train.py:1198] (3/4) Epoch 29, batch 1200, loss[loss=0.1567, ctc_loss=0.09784, cr_loss=0.2945, over 17107.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1316, cr_loss=0.3485, over 3371199.63 frames. ], batch size: 40, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:48:32,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=514728.6666666667, ans=0.125 2024-09-24 13:49:19,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=22.5 2024-09-24 13:49:28,543 INFO [train.py:1198] (3/4) Epoch 29, batch 1250, loss[loss=0.22, ctc_loss=0.1467, cr_loss=0.3667, over 17211.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1316, cr_loss=0.3485, over 3366636.87 frames. ], batch size: 47, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:49:36,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=514915.3333333333, ans=0.0 2024-09-24 13:49:40,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=514915.3333333333, ans=0.0 2024-09-24 13:49:42,755 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.263e+02 1.323e+02 1.423e+02 2.408e+02, threshold=2.646e+02, percent-clipped=0.0 2024-09-24 13:49:57,691 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 13:50:14,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=515008.6666666667, ans=0.125 2024-09-24 13:50:50,684 INFO [train.py:1198] (3/4) Epoch 29, batch 1300, loss[loss=0.1795, ctc_loss=0.115, cr_loss=0.3225, over 16965.00 frames. ], tot_loss[loss=0.2011, ctc_loss=0.1314, cr_loss=0.3482, over 3377425.43 frames. ], batch size: 42, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:50:59,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=515148.6666666667, ans=0.0 2024-09-24 13:51:03,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=515148.6666666667, ans=0.02 2024-09-24 13:51:15,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=515195.3333333333, ans=0.0 2024-09-24 13:51:16,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=515195.3333333333, ans=0.125 2024-09-24 13:51:24,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=515242.0, ans=0.025 2024-09-24 13:51:25,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=515242.0, ans=0.125 2024-09-24 13:51:45,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=515288.6666666667, ans=0.125 2024-09-24 13:52:10,988 INFO [train.py:1198] (3/4) Epoch 29, batch 1350, loss[loss=0.1943, ctc_loss=0.1261, cr_loss=0.3407, over 17179.00 frames. ], tot_loss[loss=0.1995, ctc_loss=0.1303, cr_loss=0.3459, over 3386315.49 frames. ], batch size: 45, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:52:19,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515382.0, ans=0.1 2024-09-24 13:52:22,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=515382.0, ans=0.0 2024-09-24 13:52:25,431 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.241e+02 1.323e+02 1.445e+02 2.039e+02, threshold=2.645e+02, percent-clipped=0.0 2024-09-24 13:52:36,550 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=15.0 2024-09-24 13:53:13,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=515522.0, ans=0.125 2024-09-24 13:53:35,818 INFO [train.py:1198] (3/4) Epoch 29, batch 1400, loss[loss=0.1808, ctc_loss=0.1155, cr_loss=0.3268, over 17024.00 frames. ], tot_loss[loss=0.2007, ctc_loss=0.1311, cr_loss=0.348, over 3372835.54 frames. ], batch size: 51, lr: 4.12e-03, grad_scale: 16.0 2024-09-24 13:53:47,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=22.5 2024-09-24 13:53:55,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=515662.0, ans=0.07 2024-09-24 13:54:02,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515662.0, ans=0.1 2024-09-24 13:54:18,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2024-09-24 13:54:37,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=515755.3333333333, ans=0.125 2024-09-24 13:54:57,874 INFO [train.py:1198] (3/4) Epoch 29, batch 1450, loss[loss=0.1956, ctc_loss=0.1285, cr_loss=0.3355, over 17299.00 frames. ], tot_loss[loss=0.2002, ctc_loss=0.1308, cr_loss=0.3473, over 3375530.74 frames. ], batch size: 46, lr: 4.12e-03, grad_scale: 8.0 2024-09-24 13:55:08,972 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 13:55:16,337 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.288e+02 1.365e+02 1.470e+02 2.158e+02, threshold=2.730e+02, percent-clipped=0.0 2024-09-24 13:55:23,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=515895.3333333333, ans=0.125 2024-09-24 13:55:33,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=515942.0, ans=22.5 2024-09-24 13:55:33,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=22.5 2024-09-24 13:55:38,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=515942.0, ans=0.2 2024-09-24 13:55:42,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=515942.0, ans=0.1 2024-09-24 13:55:54,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=515988.6666666667, ans=0.2 2024-09-24 13:55:55,384 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=12.0 2024-09-24 13:55:57,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=515988.6666666667, ans=0.0 2024-09-24 13:56:10,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=516035.3333333333, ans=0.125 2024-09-24 13:56:19,907 INFO [train.py:1198] (3/4) Epoch 29, batch 1500, loss[loss=0.182, ctc_loss=0.1187, cr_loss=0.3161, over 17292.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.131, cr_loss=0.347, over 3363168.62 frames. ], batch size: 51, lr: 4.12e-03, grad_scale: 8.0 2024-09-24 13:56:46,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=516128.6666666667, ans=0.125 2024-09-24 13:57:00,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=516175.3333333333, ans=0.125 2024-09-24 13:57:09,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=22.5 2024-09-24 13:57:24,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=516268.6666666667, ans=0.025 2024-09-24 13:57:33,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=516268.6666666667, ans=0.125 2024-09-24 13:57:42,717 INFO [train.py:1198] (3/4) Epoch 29, batch 1550, loss[loss=0.2147, ctc_loss=0.1429, cr_loss=0.3591, over 17067.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.1309, cr_loss=0.3469, over 3361446.89 frames. ], batch size: 46, lr: 4.12e-03, grad_scale: 8.0 2024-09-24 13:57:56,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.45 vs. limit=10.0 2024-09-24 13:57:58,708 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.272e+02 1.340e+02 1.430e+02 4.940e+02, threshold=2.681e+02, percent-clipped=1.0 2024-09-24 13:58:15,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=516408.6666666667, ans=0.2 2024-09-24 13:59:07,817 INFO [train.py:1198] (3/4) Epoch 29, batch 1600, loss[loss=0.1657, ctc_loss=0.1033, cr_loss=0.3121, over 16742.00 frames. ], tot_loss[loss=0.2005, ctc_loss=0.131, cr_loss=0.3474, over 3368059.62 frames. ], batch size: 37, lr: 4.12e-03, grad_scale: 16.0 2024-09-24 13:59:22,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=516595.3333333333, ans=0.125 2024-09-24 13:59:27,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=516595.3333333333, ans=0.125 2024-09-24 14:00:15,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=516735.3333333333, ans=0.2 2024-09-24 14:00:28,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=516782.0, ans=0.125 2024-09-24 14:00:30,028 INFO [train.py:1198] (3/4) Epoch 29, batch 1650, loss[loss=0.1954, ctc_loss=0.1307, cr_loss=0.3233, over 17154.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.131, cr_loss=0.3473, over 3367665.61 frames. ], batch size: 48, lr: 4.12e-03, grad_scale: 16.0 2024-09-24 14:00:45,987 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.269e+02 1.357e+02 1.497e+02 2.178e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-24 14:00:59,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=516828.6666666667, ans=0.125 2024-09-24 14:01:10,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=516875.3333333333, ans=0.0 2024-09-24 14:01:34,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516968.6666666667, ans=0.1 2024-09-24 14:01:40,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=516968.6666666667, ans=0.2 2024-09-24 14:01:43,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516968.6666666667, ans=0.1 2024-09-24 14:01:49,854 INFO [train.py:1198] (3/4) Epoch 29, batch 1700, loss[loss=0.1967, ctc_loss=0.1264, cr_loss=0.3514, over 17362.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.1311, cr_loss=0.3476, over 3366615.38 frames. ], batch size: 48, lr: 4.12e-03, grad_scale: 16.0 2024-09-24 14:01:51,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=517015.3333333333, ans=0.125 2024-09-24 14:01:55,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=517015.3333333333, ans=0.125 2024-09-24 14:02:14,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=517062.0, ans=0.05 2024-09-24 14:02:14,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=517062.0, ans=0.125 2024-09-24 14:02:29,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=517108.6666666667, ans=0.125 2024-09-24 14:02:38,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=517155.3333333333, ans=0.125 2024-09-24 14:02:45,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=517155.3333333333, ans=0.0 2024-09-24 14:02:48,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=517155.3333333333, ans=0.125 2024-09-24 14:03:00,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=517202.0, ans=0.025 2024-09-24 14:03:08,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=517202.0, ans=0.125 2024-09-24 14:03:14,199 INFO [train.py:1198] (3/4) Epoch 29, batch 1750, loss[loss=0.2072, ctc_loss=0.1352, cr_loss=0.3601, over 17359.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1318, cr_loss=0.3491, over 3374927.91 frames. ], batch size: 48, lr: 4.12e-03, grad_scale: 16.0 2024-09-24 14:03:30,183 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.263e+02 1.353e+02 1.492e+02 2.426e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-24 14:03:33,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=517295.3333333333, ans=0.0 2024-09-24 14:03:40,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=517295.3333333333, ans=0.125 2024-09-24 14:04:16,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=517388.6666666667, ans=0.125 2024-09-24 14:04:33,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=517435.3333333333, ans=0.125 2024-09-24 14:04:36,177 INFO [train.py:1198] (3/4) Epoch 29, batch 1800, loss[loss=0.1721, ctc_loss=0.1104, cr_loss=0.3085, over 17265.00 frames. ], tot_loss[loss=0.2023, ctc_loss=0.1322, cr_loss=0.3502, over 3369563.42 frames. ], batch size: 42, lr: 4.12e-03, grad_scale: 16.0 2024-09-24 14:04:36,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=517482.0, ans=0.1 2024-09-24 14:04:38,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=517482.0, ans=0.125 2024-09-24 14:04:52,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=517528.6666666667, ans=0.125 2024-09-24 14:05:01,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=517528.6666666667, ans=0.0 2024-09-24 14:05:58,951 INFO [train.py:1198] (3/4) Epoch 29, batch 1850, loss[loss=0.1853, ctc_loss=0.1225, cr_loss=0.3145, over 16984.00 frames. ], tot_loss[loss=0.2024, ctc_loss=0.1323, cr_loss=0.3505, over 3366523.99 frames. ], batch size: 42, lr: 4.12e-03, grad_scale: 8.0 2024-09-24 14:05:59,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2024-09-24 14:06:00,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=517715.3333333333, ans=0.125 2024-09-24 14:06:08,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=517715.3333333333, ans=0.125 2024-09-24 14:06:16,452 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.270e+02 1.382e+02 1.482e+02 2.420e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-24 14:06:21,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=517762.0, ans=0.125 2024-09-24 14:06:32,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=517808.6666666667, ans=0.125 2024-09-24 14:06:45,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=517855.3333333333, ans=0.035 2024-09-24 14:07:01,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=517902.0, ans=0.125 2024-09-24 14:07:09,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=517902.0, ans=0.125 2024-09-24 14:07:12,776 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 14:07:12,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=517902.0, ans=0.125 2024-09-24 14:07:21,465 INFO [train.py:1198] (3/4) Epoch 29, batch 1900, loss[loss=0.2024, ctc_loss=0.1326, cr_loss=0.3489, over 17306.00 frames. ], tot_loss[loss=0.2023, ctc_loss=0.1323, cr_loss=0.3502, over 3356695.10 frames. ], batch size: 49, lr: 4.12e-03, grad_scale: 8.0 2024-09-24 14:07:31,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=517948.6666666667, ans=0.025 2024-09-24 14:07:53,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=517995.3333333333, ans=0.0 2024-09-24 14:08:20,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=518088.6666666667, ans=0.5 2024-09-24 14:08:33,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-09-24 14:08:34,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=518135.3333333333, ans=0.125 2024-09-24 14:08:42,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=12.0 2024-09-24 14:08:43,831 INFO [train.py:1198] (3/4) Epoch 29, batch 1950, loss[loss=0.173, ctc_loss=0.1069, cr_loss=0.3303, over 17098.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.1327, cr_loss=0.3512, over 3354917.96 frames. ], batch size: 40, lr: 4.11e-03, grad_scale: 8.0 2024-09-24 14:09:03,915 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.275e+02 1.365e+02 1.439e+02 2.080e+02, threshold=2.731e+02, percent-clipped=0.0 2024-09-24 14:09:28,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=518275.3333333333, ans=0.125 2024-09-24 14:10:08,970 INFO [train.py:1198] (3/4) Epoch 29, batch 2000, loss[loss=0.1863, ctc_loss=0.119, cr_loss=0.3365, over 17275.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.1327, cr_loss=0.3511, over 3353524.76 frames. ], batch size: 42, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:10:15,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=518415.3333333333, ans=0.025 2024-09-24 14:10:25,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=518462.0, ans=0.125 2024-09-24 14:11:09,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=518555.3333333333, ans=0.0 2024-09-24 14:11:28,976 INFO [train.py:1198] (3/4) Epoch 29, batch 2050, loss[loss=0.1997, ctc_loss=0.1291, cr_loss=0.3531, over 17151.00 frames. ], tot_loss[loss=0.2027, ctc_loss=0.1325, cr_loss=0.3509, over 3361083.60 frames. ], batch size: 48, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:11:43,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=518695.3333333333, ans=0.1 2024-09-24 14:11:46,520 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.263e+02 1.339e+02 1.463e+02 3.835e+02, threshold=2.678e+02, percent-clipped=1.0 2024-09-24 14:11:48,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=22.5 2024-09-24 14:12:20,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.98 vs. limit=15.0 2024-09-24 14:12:44,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2024-09-24 14:12:54,007 INFO [train.py:1198] (3/4) Epoch 29, batch 2100, loss[loss=0.235, ctc_loss=0.1556, cr_loss=0.3971, over 17020.00 frames. ], tot_loss[loss=0.2037, ctc_loss=0.1334, cr_loss=0.3518, over 3355884.05 frames. ], batch size: 56, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:12:59,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.04 vs. limit=22.5 2024-09-24 14:14:00,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=519068.6666666667, ans=0.125 2024-09-24 14:14:13,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=519068.6666666667, ans=0.125 2024-09-24 14:14:15,829 INFO [train.py:1198] (3/4) Epoch 29, batch 2150, loss[loss=0.2191, ctc_loss=0.1461, cr_loss=0.3652, over 17028.00 frames. ], tot_loss[loss=0.2034, ctc_loss=0.1331, cr_loss=0.3513, over 3357659.09 frames. ], batch size: 51, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:14:30,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=519162.0, ans=0.1 2024-09-24 14:14:33,680 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.281e+02 1.376e+02 1.508e+02 1.841e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-24 14:14:52,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=519208.6666666667, ans=0.0 2024-09-24 14:14:55,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=519208.6666666667, ans=0.125 2024-09-24 14:15:00,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=519208.6666666667, ans=0.125 2024-09-24 14:15:15,361 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=15.0 2024-09-24 14:15:17,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-09-24 14:15:21,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=519302.0, ans=0.0 2024-09-24 14:15:38,900 INFO [train.py:1198] (3/4) Epoch 29, batch 2200, loss[loss=0.1937, ctc_loss=0.1276, cr_loss=0.3305, over 17021.00 frames. ], tot_loss[loss=0.2023, ctc_loss=0.1323, cr_loss=0.3501, over 3360977.48 frames. ], batch size: 44, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:15:44,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=519348.6666666667, ans=0.125 2024-09-24 14:16:01,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=519395.3333333333, ans=0.025 2024-09-24 14:16:39,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2024-09-24 14:16:42,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519535.3333333333, ans=0.1 2024-09-24 14:16:54,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=519535.3333333333, ans=0.0 2024-09-24 14:16:59,446 INFO [train.py:1198] (3/4) Epoch 29, batch 2250, loss[loss=0.1954, ctc_loss=0.1261, cr_loss=0.3464, over 17208.00 frames. ], tot_loss[loss=0.2017, ctc_loss=0.1318, cr_loss=0.3496, over 3363536.42 frames. ], batch size: 47, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:17:19,627 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.297e+02 1.378e+02 1.443e+02 1.753e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-24 14:17:53,129 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2024-09-24 14:18:09,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=519768.6666666667, ans=0.0 2024-09-24 14:18:23,699 INFO [train.py:1198] (3/4) Epoch 29, batch 2300, loss[loss=0.1983, ctc_loss=0.1255, cr_loss=0.3638, over 17014.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.1307, cr_loss=0.3479, over 3369140.99 frames. ], batch size: 39, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:18:25,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=519815.3333333333, ans=0.125 2024-09-24 14:18:48,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=519862.0, ans=0.2 2024-09-24 14:19:19,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=519955.3333333333, ans=0.125 2024-09-24 14:19:22,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=519955.3333333333, ans=0.0 2024-09-24 14:19:35,903 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.35 vs. limit=15.0 2024-09-24 14:19:40,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=520002.0, ans=0.0 2024-09-24 14:19:46,205 INFO [train.py:1198] (3/4) Epoch 29, batch 2350, loss[loss=0.1782, ctc_loss=0.1162, cr_loss=0.3103, over 17200.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.1312, cr_loss=0.3482, over 3354173.34 frames. ], batch size: 41, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:19:52,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=520048.6666666667, ans=0.1 2024-09-24 14:19:58,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=520048.6666666667, ans=0.125 2024-09-24 14:20:06,254 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.272e+02 1.348e+02 1.492e+02 2.219e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-24 14:20:11,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=520095.3333333333, ans=0.0 2024-09-24 14:20:32,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=520142.0, ans=0.0 2024-09-24 14:20:56,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=520235.3333333333, ans=0.125 2024-09-24 14:20:57,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=520235.3333333333, ans=0.0 2024-09-24 14:21:08,867 INFO [train.py:1198] (3/4) Epoch 29, batch 2400, loss[loss=0.1693, ctc_loss=0.1086, cr_loss=0.3035, over 17162.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1303, cr_loss=0.3467, over 3354349.83 frames. ], batch size: 45, lr: 4.11e-03, grad_scale: 32.0 2024-09-24 14:21:09,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=520282.0, ans=0.2 2024-09-24 14:21:12,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.74 vs. limit=15.0 2024-09-24 14:21:15,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.38 vs. limit=10.0 2024-09-24 14:21:20,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=520282.0, ans=0.125 2024-09-24 14:21:38,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=520328.6666666667, ans=0.2 2024-09-24 14:21:38,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=520328.6666666667, ans=0.1 2024-09-24 14:21:43,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=520375.3333333333, ans=0.2 2024-09-24 14:21:58,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.99 vs. limit=15.0 2024-09-24 14:22:04,267 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2024-09-24 14:22:05,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.15 vs. limit=10.0 2024-09-24 14:22:12,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2024-09-24 14:22:31,468 INFO [train.py:1198] (3/4) Epoch 29, batch 2450, loss[loss=0.1998, ctc_loss=0.1287, cr_loss=0.3552, over 16958.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.1301, cr_loss=0.3463, over 3354211.29 frames. ], batch size: 42, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:22:36,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=520515.3333333333, ans=0.2 2024-09-24 14:22:53,222 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.297e+02 1.387e+02 1.500e+02 2.305e+02, threshold=2.773e+02, percent-clipped=0.0 2024-09-24 14:23:07,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=520608.6666666667, ans=0.1 2024-09-24 14:23:11,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=520608.6666666667, ans=0.2 2024-09-24 14:23:23,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=520655.3333333333, ans=0.0 2024-09-24 14:23:30,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.32 vs. limit=22.5 2024-09-24 14:23:38,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=520702.0, ans=22.5 2024-09-24 14:23:56,404 INFO [train.py:1198] (3/4) Epoch 29, batch 2500, loss[loss=0.1917, ctc_loss=0.1225, cr_loss=0.346, over 17148.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1318, cr_loss=0.3491, over 3332731.76 frames. ], batch size: 45, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:24:12,843 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 14:24:29,384 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2024-09-24 14:24:54,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=15.0 2024-09-24 14:25:01,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=520935.3333333333, ans=0.125 2024-09-24 14:25:04,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=15.0 2024-09-24 14:25:06,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=520935.3333333333, ans=0.2 2024-09-24 14:25:06,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=520935.3333333333, ans=0.0 2024-09-24 14:25:09,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=520935.3333333333, ans=0.125 2024-09-24 14:25:18,998 INFO [train.py:1198] (3/4) Epoch 29, batch 2550, loss[loss=0.2055, ctc_loss=0.1359, cr_loss=0.3479, over 17297.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1318, cr_loss=0.3487, over 3335133.06 frames. ], batch size: 51, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:25:27,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=520982.0, ans=0.2 2024-09-24 14:25:35,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=521028.6666666667, ans=0.125 2024-09-24 14:25:38,169 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.280e+02 1.353e+02 1.438e+02 1.912e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-24 14:25:40,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0 2024-09-24 14:25:41,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=521028.6666666667, ans=0.95 2024-09-24 14:25:43,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=521028.6666666667, ans=0.125 2024-09-24 14:25:57,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=521075.3333333333, ans=0.125 2024-09-24 14:26:02,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=521075.3333333333, ans=0.125 2024-09-24 14:26:06,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=521122.0, ans=0.0 2024-09-24 14:26:24,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=521168.6666666667, ans=0.125 2024-09-24 14:26:27,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=521168.6666666667, ans=0.0 2024-09-24 14:26:38,265 INFO [train.py:1198] (3/4) Epoch 29, batch 2600, loss[loss=0.2114, ctc_loss=0.1384, cr_loss=0.3652, over 16730.00 frames. ], tot_loss[loss=0.2021, ctc_loss=0.1323, cr_loss=0.3489, over 3337250.33 frames. ], batch size: 61, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:26:55,168 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=15.0 2024-09-24 14:27:27,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=521355.3333333333, ans=0.2 2024-09-24 14:27:43,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2024-09-24 14:28:03,409 INFO [train.py:1198] (3/4) Epoch 29, batch 2650, loss[loss=0.2466, ctc_loss=0.1652, cr_loss=0.4072, over 15064.00 frames. ], tot_loss[loss=0.2019, ctc_loss=0.1321, cr_loss=0.3492, over 3341441.39 frames. ], batch size: 89, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:28:16,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=521448.6666666667, ans=0.0 2024-09-24 14:28:22,577 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.267e+02 1.389e+02 1.480e+02 2.068e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-24 14:28:44,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=521542.0, ans=0.125 2024-09-24 14:29:13,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=521635.3333333333, ans=0.125 2024-09-24 14:29:25,645 INFO [train.py:1198] (3/4) Epoch 29, batch 2700, loss[loss=0.1953, ctc_loss=0.127, cr_loss=0.3414, over 17207.00 frames. ], tot_loss[loss=0.2017, ctc_loss=0.1319, cr_loss=0.349, over 3347477.76 frames. ], batch size: 47, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:29:27,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=521682.0, ans=0.0 2024-09-24 14:29:34,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=521682.0, ans=0.2 2024-09-24 14:29:53,675 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.77 vs. limit=10.0 2024-09-24 14:29:53,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=521728.6666666667, ans=10.0 2024-09-24 14:29:57,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521728.6666666667, ans=0.1 2024-09-24 14:30:47,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=521915.3333333333, ans=0.125 2024-09-24 14:30:49,229 INFO [train.py:1198] (3/4) Epoch 29, batch 2750, loss[loss=0.228, ctc_loss=0.1488, cr_loss=0.3958, over 17013.00 frames. ], tot_loss[loss=0.2021, ctc_loss=0.1322, cr_loss=0.3495, over 3351283.02 frames. ], batch size: 52, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:30:52,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521915.3333333333, ans=0.1 2024-09-24 14:30:54,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2024-09-24 14:30:57,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=521915.3333333333, ans=0.0 2024-09-24 14:31:00,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=521915.3333333333, ans=0.125 2024-09-24 14:31:02,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=521915.3333333333, ans=0.0 2024-09-24 14:31:08,250 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.228e+02 1.306e+02 1.411e+02 3.014e+02, threshold=2.612e+02, percent-clipped=1.0 2024-09-24 14:31:10,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=521962.0, ans=0.125 2024-09-24 14:31:19,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=522008.6666666667, ans=0.125 2024-09-24 14:31:38,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=522055.3333333333, ans=0.125 2024-09-24 14:31:48,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=522055.3333333333, ans=0.0 2024-09-24 14:31:49,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=522055.3333333333, ans=0.125 2024-09-24 14:32:11,739 INFO [train.py:1198] (3/4) Epoch 29, batch 2800, loss[loss=0.2203, ctc_loss=0.1471, cr_loss=0.366, over 15148.00 frames. ], tot_loss[loss=0.202, ctc_loss=0.1321, cr_loss=0.3493, over 3347245.97 frames. ], batch size: 89, lr: 4.10e-03, grad_scale: 32.0 2024-09-24 14:32:23,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=522148.6666666667, ans=0.125 2024-09-24 14:32:24,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=522148.6666666667, ans=0.125 2024-09-24 14:32:32,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=522195.3333333333, ans=0.025 2024-09-24 14:32:37,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=522195.3333333333, ans=0.1 2024-09-24 14:32:53,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=522242.0, ans=0.0 2024-09-24 14:32:57,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=522242.0, ans=0.0 2024-09-24 14:33:13,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=522288.6666666667, ans=0.125 2024-09-24 14:33:23,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=522335.3333333333, ans=0.0 2024-09-24 14:33:25,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.12 vs. limit=15.0 2024-09-24 14:33:32,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=522382.0, ans=0.0 2024-09-24 14:33:34,160 INFO [train.py:1198] (3/4) Epoch 29, batch 2850, loss[loss=0.2003, ctc_loss=0.132, cr_loss=0.3413, over 17299.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1316, cr_loss=0.3485, over 3344395.84 frames. ], batch size: 46, lr: 4.10e-03, grad_scale: 32.0 2024-09-24 14:33:57,877 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.283e+02 1.399e+02 1.566e+02 1.810e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-24 14:34:24,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=522522.0, ans=0.0 2024-09-24 14:34:42,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=522568.6666666667, ans=0.0 2024-09-24 14:34:45,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=522568.6666666667, ans=0.125 2024-09-24 14:35:00,064 INFO [train.py:1198] (3/4) Epoch 29, batch 2900, loss[loss=0.1886, ctc_loss=0.1251, cr_loss=0.3175, over 17063.00 frames. ], tot_loss[loss=0.202, ctc_loss=0.132, cr_loss=0.3498, over 3349070.02 frames. ], batch size: 46, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:35:31,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=522662.0, ans=0.1 2024-09-24 14:35:39,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=522708.6666666667, ans=0.0 2024-09-24 14:36:22,784 INFO [train.py:1198] (3/4) Epoch 29, batch 2950, loss[loss=0.192, ctc_loss=0.1232, cr_loss=0.3444, over 17316.00 frames. ], tot_loss[loss=0.2007, ctc_loss=0.1312, cr_loss=0.3479, over 3358339.49 frames. ], batch size: 51, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:36:36,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=15.0 2024-09-24 14:36:40,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=522895.3333333333, ans=0.1 2024-09-24 14:36:43,419 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.270e+02 1.367e+02 1.488e+02 1.985e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-24 14:36:49,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=522895.3333333333, ans=0.125 2024-09-24 14:37:04,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.93 vs. limit=10.0 2024-09-24 14:37:30,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=523035.3333333333, ans=22.5 2024-09-24 14:37:43,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=523082.0, ans=0.0 2024-09-24 14:37:44,731 INFO [train.py:1198] (3/4) Epoch 29, batch 3000, loss[loss=0.2093, ctc_loss=0.1419, cr_loss=0.3369, over 17035.00 frames. ], tot_loss[loss=0.2011, ctc_loss=0.1314, cr_loss=0.3487, over 3356712.28 frames. ], batch size: 51, lr: 4.09e-03, grad_scale: 16.0 2024-09-24 14:37:44,731 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 14:38:00,320 INFO [train.py:1230] (3/4) Epoch 29, validation: loss=0.03658, ctc_loss=0.03658, cr_loss=8.731e-15, over 944034.00 frames. 2024-09-24 14:38:00,321 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 14:39:02,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=523268.6666666667, ans=0.125 2024-09-24 14:39:05,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=523268.6666666667, ans=0.2 2024-09-24 14:39:13,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=523268.6666666667, ans=0.0 2024-09-24 14:39:19,060 INFO [train.py:1198] (3/4) Epoch 29, batch 3050, loss[loss=0.2157, ctc_loss=0.1419, cr_loss=0.3688, over 17214.00 frames. ], tot_loss[loss=0.2007, ctc_loss=0.1311, cr_loss=0.3479, over 3351252.03 frames. ], batch size: 47, lr: 4.09e-03, grad_scale: 16.0 2024-09-24 14:39:29,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=523315.3333333333, ans=0.125 2024-09-24 14:39:42,288 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.250e+02 1.367e+02 1.506e+02 1.984e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-24 14:39:44,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2024-09-24 14:40:03,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=523408.6666666667, ans=0.1 2024-09-24 14:40:23,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=523502.0, ans=0.125 2024-09-24 14:40:30,342 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.09 vs. limit=10.0 2024-09-24 14:40:31,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=523502.0, ans=0.125 2024-09-24 14:40:39,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=523548.6666666667, ans=0.2 2024-09-24 14:40:40,604 INFO [train.py:1198] (3/4) Epoch 29, batch 3100, loss[loss=0.1992, ctc_loss=0.1293, cr_loss=0.349, over 17322.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.131, cr_loss=0.3481, over 3362453.46 frames. ], batch size: 51, lr: 4.09e-03, grad_scale: 16.0 2024-09-24 14:40:40,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=523548.6666666667, ans=0.025 2024-09-24 14:40:44,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.49 vs. limit=15.0 2024-09-24 14:41:20,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=523642.0, ans=0.125 2024-09-24 14:41:22,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=22.5 2024-09-24 14:41:46,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=12.0 2024-09-24 14:41:49,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=523735.3333333333, ans=0.1 2024-09-24 14:42:01,588 INFO [train.py:1198] (3/4) Epoch 29, batch 3150, loss[loss=0.2371, ctc_loss=0.1555, cr_loss=0.408, over 15020.00 frames. ], tot_loss[loss=0.2011, ctc_loss=0.1313, cr_loss=0.3489, over 3360950.29 frames. ], batch size: 89, lr: 4.09e-03, grad_scale: 16.0 2024-09-24 14:42:01,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=523782.0, ans=0.2 2024-09-24 14:42:06,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=523782.0, ans=0.125 2024-09-24 14:42:21,631 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.246e+02 1.357e+02 1.527e+02 1.959e+02, threshold=2.714e+02, percent-clipped=0.0 2024-09-24 14:42:31,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2024-09-24 14:42:52,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=523922.0, ans=0.2 2024-09-24 14:43:06,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2024-09-24 14:43:18,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=524015.3333333333, ans=0.125 2024-09-24 14:43:19,441 INFO [train.py:1198] (3/4) Epoch 29, batch 3200, loss[loss=0.2056, ctc_loss=0.1314, cr_loss=0.3709, over 15949.00 frames. ], tot_loss[loss=0.2017, ctc_loss=0.1318, cr_loss=0.3497, over 3347283.34 frames. ], batch size: 74, lr: 4.09e-03, grad_scale: 32.0 2024-09-24 14:43:36,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=524062.0, ans=0.1 2024-09-24 14:43:58,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2024-09-24 14:43:59,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=524108.6666666667, ans=0.05 2024-09-24 14:44:25,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=524202.0, ans=0.0 2024-09-24 14:44:35,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=524248.6666666667, ans=0.125 2024-09-24 14:44:37,184 INFO [train.py:1198] (3/4) Epoch 29, batch 3250, loss[loss=0.234, ctc_loss=0.1587, cr_loss=0.3765, over 14915.00 frames. ], tot_loss[loss=0.2022, ctc_loss=0.1321, cr_loss=0.3505, over 3349553.69 frames. ], batch size: 88, lr: 4.09e-03, grad_scale: 32.0 2024-09-24 14:44:51,736 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.57 vs. limit=15.0 2024-09-24 14:44:56,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=524295.3333333334, ans=0.125 2024-09-24 14:44:57,322 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.269e+02 1.350e+02 1.434e+02 1.655e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-24 14:45:00,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=524295.3333333334, ans=0.0 2024-09-24 14:45:52,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=524435.3333333334, ans=0.07 2024-09-24 14:45:55,503 INFO [train.py:1198] (3/4) Epoch 29, batch 3300, loss[loss=0.1739, ctc_loss=0.1092, cr_loss=0.3237, over 17074.00 frames. ], tot_loss[loss=0.2015, ctc_loss=0.1316, cr_loss=0.3496, over 3356010.62 frames. ], batch size: 43, lr: 4.09e-03, grad_scale: 32.0 2024-09-24 14:46:18,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=524528.6666666666, ans=0.0 2024-09-24 14:46:43,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=524622.0, ans=0.125 2024-09-24 14:46:44,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=524622.0, ans=0.125 2024-09-24 14:46:55,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=524622.0, ans=0.04949747468305833 2024-09-24 14:47:02,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-24 14:47:15,324 INFO [train.py:1198] (3/4) Epoch 29, batch 3350, loss[loss=0.1908, ctc_loss=0.1229, cr_loss=0.3392, over 17066.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1312, cr_loss=0.3484, over 3357020.04 frames. ], batch size: 46, lr: 4.09e-03, grad_scale: 32.0 2024-09-24 14:47:32,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=524762.0, ans=10.0 2024-09-24 14:47:35,614 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.319e+02 1.369e+02 1.450e+02 1.942e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-24 14:47:37,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=524762.0, ans=0.0 2024-09-24 14:47:57,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2024-09-24 14:48:15,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=524855.3333333334, ans=0.0 2024-09-24 14:48:35,633 INFO [train.py:1198] (3/4) Epoch 29, batch 3400, loss[loss=0.2227, ctc_loss=0.1461, cr_loss=0.3829, over 16817.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1318, cr_loss=0.3491, over 3344720.17 frames. ], batch size: 61, lr: 4.09e-03, grad_scale: 32.0 2024-09-24 14:48:40,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=524948.6666666666, ans=0.2 2024-09-24 14:48:53,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=524995.3333333334, ans=0.125 2024-09-24 14:48:59,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=524995.3333333334, ans=0.015 2024-09-24 14:48:59,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=524995.3333333334, ans=0.0 2024-09-24 14:49:02,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=524995.3333333334, ans=0.125 2024-09-24 14:49:07,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=525042.0, ans=0.125 2024-09-24 14:49:07,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=525042.0, ans=0.125 2024-09-24 14:49:24,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=525088.6666666666, ans=0.125 2024-09-24 14:49:27,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=525088.6666666666, ans=0.0 2024-09-24 14:49:55,471 INFO [train.py:1198] (3/4) Epoch 29, batch 3450, loss[loss=0.217, ctc_loss=0.142, cr_loss=0.3751, over 16891.00 frames. ], tot_loss[loss=0.2024, ctc_loss=0.1324, cr_loss=0.3502, over 3338853.43 frames. ], batch size: 58, lr: 4.09e-03, grad_scale: 32.0 2024-09-24 14:50:05,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=525182.0, ans=0.125 2024-09-24 14:50:12,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2024-09-24 14:50:13,506 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2024-09-24 14:50:15,930 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.258e+02 1.341e+02 1.460e+02 2.344e+02, threshold=2.681e+02, percent-clipped=0.0 2024-09-24 14:50:19,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=525228.6666666666, ans=0.125 2024-09-24 14:50:20,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=525228.6666666666, ans=0.1 2024-09-24 14:50:54,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=525322.0, ans=0.1 2024-09-24 14:50:59,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=22.5 2024-09-24 14:51:14,075 INFO [train.py:1198] (3/4) Epoch 29, batch 3500, loss[loss=0.2081, ctc_loss=0.1345, cr_loss=0.3678, over 17174.00 frames. ], tot_loss[loss=0.2026, ctc_loss=0.1326, cr_loss=0.3502, over 3335588.82 frames. ], batch size: 45, lr: 4.09e-03, grad_scale: 32.0 2024-09-24 14:51:15,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=525415.3333333334, ans=0.0 2024-09-24 14:51:21,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-09-24 14:52:02,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=525555.3333333334, ans=0.125 2024-09-24 14:52:08,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525555.3333333334, ans=0.1 2024-09-24 14:52:19,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-09-24 14:52:29,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=525602.0, ans=0.025 2024-09-24 14:52:34,353 INFO [train.py:1198] (3/4) Epoch 29, batch 3550, loss[loss=0.1786, ctc_loss=0.1151, cr_loss=0.3173, over 17275.00 frames. ], tot_loss[loss=0.2027, ctc_loss=0.1326, cr_loss=0.3507, over 3347746.07 frames. ], batch size: 42, lr: 4.08e-03, grad_scale: 32.0 2024-09-24 14:52:54,493 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.276e+02 1.353e+02 1.468e+02 1.866e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-24 14:52:56,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=525695.3333333334, ans=0.125 2024-09-24 14:53:52,069 INFO [train.py:1198] (3/4) Epoch 29, batch 3600, loss[loss=0.1854, ctc_loss=0.1173, cr_loss=0.3407, over 17097.00 frames. ], tot_loss[loss=0.2017, ctc_loss=0.1318, cr_loss=0.3497, over 3353793.83 frames. ], batch size: 49, lr: 4.08e-03, grad_scale: 32.0 2024-09-24 14:54:16,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=15.0 2024-09-24 14:54:16,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.74 vs. limit=8.0 2024-09-24 14:54:17,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=525928.6666666666, ans=0.125 2024-09-24 14:54:28,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=525975.3333333334, ans=0.125 2024-09-24 14:54:28,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.51 vs. limit=15.0 2024-09-24 14:55:08,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=526115.3333333334, ans=0.2 2024-09-24 14:55:10,107 INFO [train.py:1198] (3/4) Epoch 29, batch 3650, loss[loss=0.2313, ctc_loss=0.1532, cr_loss=0.3903, over 16031.00 frames. ], tot_loss[loss=0.2022, ctc_loss=0.132, cr_loss=0.3509, over 3356670.65 frames. ], batch size: 74, lr: 4.08e-03, grad_scale: 32.0 2024-09-24 14:55:26,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.46 vs. limit=22.5 2024-09-24 14:55:31,931 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.251e+02 1.330e+02 1.491e+02 2.044e+02, threshold=2.661e+02, percent-clipped=0.0 2024-09-24 14:55:42,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-09-24 14:55:50,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=526208.6666666666, ans=0.5 2024-09-24 14:56:02,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=526255.3333333334, ans=0.125 2024-09-24 14:56:05,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=526255.3333333334, ans=0.2 2024-09-24 14:56:16,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2024-09-24 14:56:28,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=526302.0, ans=0.1 2024-09-24 14:56:31,285 INFO [train.py:1198] (3/4) Epoch 29, batch 3700, loss[loss=0.1682, ctc_loss=0.106, cr_loss=0.311, over 17040.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1313, cr_loss=0.35, over 3356080.09 frames. ], batch size: 39, lr: 4.08e-03, grad_scale: 16.0 2024-09-24 14:56:33,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=526348.6666666666, ans=0.125 2024-09-24 14:56:37,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=526348.6666666666, ans=0.2 2024-09-24 14:57:05,905 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 14:57:29,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=526488.6666666666, ans=0.0 2024-09-24 14:57:51,172 INFO [train.py:1198] (3/4) Epoch 29, batch 3750, loss[loss=0.222, ctc_loss=0.1441, cr_loss=0.3897, over 17028.00 frames. ], tot_loss[loss=0.2005, ctc_loss=0.1307, cr_loss=0.3486, over 3355039.53 frames. ], batch size: 52, lr: 4.08e-03, grad_scale: 16.0 2024-09-24 14:58:13,097 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.272e+02 1.350e+02 1.451e+02 2.081e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-24 14:58:32,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=526675.3333333334, ans=0.2 2024-09-24 14:58:37,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.73 vs. limit=12.0 2024-09-24 14:58:54,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526768.6666666666, ans=0.1 2024-09-24 14:58:57,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=526768.6666666666, ans=0.2 2024-09-24 14:59:10,335 INFO [train.py:1198] (3/4) Epoch 29, batch 3800, loss[loss=0.1781, ctc_loss=0.114, cr_loss=0.3206, over 16755.00 frames. ], tot_loss[loss=0.2025, ctc_loss=0.1323, cr_loss=0.3506, over 3327607.56 frames. ], batch size: 37, lr: 4.08e-03, grad_scale: 16.0 2024-09-24 14:59:10,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=526815.3333333334, ans=0.125 2024-09-24 14:59:25,285 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.14 vs. limit=15.0 2024-09-24 14:59:26,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=526862.0, ans=0.04949747468305833 2024-09-24 14:59:43,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=526908.6666666666, ans=0.125 2024-09-24 14:59:47,027 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.74 vs. limit=10.0 2024-09-24 15:00:01,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.52 vs. limit=15.0 2024-09-24 15:00:16,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=527002.0, ans=0.125 2024-09-24 15:00:20,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=527002.0, ans=0.1 2024-09-24 15:00:22,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=527002.0, ans=0.125 2024-09-24 15:00:27,868 INFO [train.py:1198] (3/4) Epoch 29, batch 3850, loss[loss=0.208, ctc_loss=0.1342, cr_loss=0.3689, over 17016.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1341, cr_loss=0.3531, over 3298377.75 frames. ], batch size: 51, lr: 4.08e-03, grad_scale: 16.0 2024-09-24 15:00:32,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=527048.6666666666, ans=0.0 2024-09-24 15:00:49,021 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.344e+02 1.460e+02 1.620e+02 2.302e+02, threshold=2.919e+02, percent-clipped=0.0 2024-09-24 15:01:30,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=527235.3333333334, ans=0.0 2024-09-24 15:02:28,432 INFO [train.py:1198] (3/4) Epoch 30, batch 0, loss[loss=0.1747, ctc_loss=0.1153, cr_loss=0.2969, over 17288.00 frames. ], tot_loss[loss=0.1747, ctc_loss=0.1153, cr_loss=0.2969, over 17288.00 frames. ], batch size: 46, lr: 4.01e-03, grad_scale: 32.0 2024-09-24 15:02:28,432 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 15:02:37,759 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.3352, 2.7329, 2.6834, 2.7874, 2.5102, 2.4899, 2.8281, 2.8737], device='cuda:3') 2024-09-24 15:02:43,739 INFO [train.py:1230] (3/4) Epoch 30, validation: loss=0.0352, ctc_loss=0.0352, cr_loss=9.262e-15, over 944034.00 frames. 2024-09-24 15:02:43,739 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 15:03:11,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527310.0, ans=0.1 2024-09-24 15:03:14,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527310.0, ans=0.1 2024-09-24 15:03:32,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=527356.6666666666, ans=0.125 2024-09-24 15:03:52,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=527450.0, ans=0.125 2024-09-24 15:03:59,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=15.0 2024-09-24 15:04:05,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527496.6666666666, ans=0.1 2024-09-24 15:04:07,012 INFO [train.py:1198] (3/4) Epoch 30, batch 50, loss[loss=0.2366, ctc_loss=0.1578, cr_loss=0.3944, over 14855.00 frames. ], tot_loss[loss=0.1984, ctc_loss=0.1293, cr_loss=0.3456, over 759100.58 frames. ], batch size: 89, lr: 4.01e-03, grad_scale: 32.0 2024-09-24 15:04:21,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=527543.3333333334, ans=0.025 2024-09-24 15:04:24,283 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.01 vs. limit=10.0 2024-09-24 15:04:25,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=527543.3333333334, ans=0.09899494936611666 2024-09-24 15:04:35,822 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.261e+02 1.395e+02 1.583e+02 2.602e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-24 15:04:47,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=527590.0, ans=0.2 2024-09-24 15:04:47,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2024-09-24 15:05:17,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=527683.3333333334, ans=0.125 2024-09-24 15:05:21,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=527683.3333333334, ans=0.125 2024-09-24 15:05:30,396 INFO [train.py:1198] (3/4) Epoch 30, batch 100, loss[loss=0.2067, ctc_loss=0.1363, cr_loss=0.3519, over 17165.00 frames. ], tot_loss[loss=0.2015, ctc_loss=0.1318, cr_loss=0.3487, over 1336313.44 frames. ], batch size: 45, lr: 4.01e-03, grad_scale: 32.0 2024-09-24 15:05:53,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=527776.6666666666, ans=0.125 2024-09-24 15:06:18,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=527870.0, ans=0.1 2024-09-24 15:06:37,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=527916.6666666666, ans=0.0 2024-09-24 15:06:55,915 INFO [train.py:1198] (3/4) Epoch 30, batch 150, loss[loss=0.1993, ctc_loss=0.1287, cr_loss=0.3528, over 17259.00 frames. ], tot_loss[loss=0.2032, ctc_loss=0.1329, cr_loss=0.3513, over 1768499.87 frames. ], batch size: 44, lr: 4.01e-03, grad_scale: 32.0 2024-09-24 15:07:02,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=527963.3333333334, ans=0.0 2024-09-24 15:07:05,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=527963.3333333334, ans=0.0 2024-09-24 15:07:14,045 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.26 vs. limit=12.0 2024-09-24 15:07:24,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2024-09-24 15:07:24,703 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.297e+02 1.402e+02 1.523e+02 2.631e+02, threshold=2.805e+02, percent-clipped=0.0 2024-09-24 15:07:34,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=528056.6666666666, ans=0.0 2024-09-24 15:07:36,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=528056.6666666666, ans=0.125 2024-09-24 15:07:41,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.75 vs. limit=15.0 2024-09-24 15:08:19,332 INFO [train.py:1198] (3/4) Epoch 30, batch 200, loss[loss=0.2495, ctc_loss=0.1674, cr_loss=0.4105, over 14893.00 frames. ], tot_loss[loss=0.2039, ctc_loss=0.1335, cr_loss=0.3519, over 2101706.50 frames. ], batch size: 89, lr: 4.01e-03, grad_scale: 32.0 2024-09-24 15:08:22,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=528196.6666666666, ans=0.0 2024-09-24 15:08:27,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=528196.6666666666, ans=0.125 2024-09-24 15:08:58,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=528290.0, ans=0.125 2024-09-24 15:09:24,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=528383.3333333334, ans=0.95 2024-09-24 15:09:32,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.64 vs. limit=6.0 2024-09-24 15:09:42,473 INFO [train.py:1198] (3/4) Epoch 30, batch 250, loss[loss=0.2377, ctc_loss=0.1582, cr_loss=0.3977, over 17222.00 frames. ], tot_loss[loss=0.2036, ctc_loss=0.1333, cr_loss=0.3517, over 2378931.27 frames. ], batch size: 55, lr: 4.00e-03, grad_scale: 32.0 2024-09-24 15:09:49,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=528430.0, ans=0.0 2024-09-24 15:09:52,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=528430.0, ans=0.125 2024-09-24 15:09:55,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=528430.0, ans=0.95 2024-09-24 15:10:04,138 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2024-09-24 15:10:09,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=528476.6666666666, ans=0.2 2024-09-24 15:10:11,297 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.266e+02 1.350e+02 1.451e+02 1.821e+02, threshold=2.699e+02, percent-clipped=0.0 2024-09-24 15:10:35,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=528570.0, ans=0.125 2024-09-24 15:10:36,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.72 vs. limit=12.0 2024-09-24 15:10:39,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=528570.0, ans=0.0 2024-09-24 15:10:59,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=528616.6666666666, ans=0.0 2024-09-24 15:11:02,814 INFO [train.py:1198] (3/4) Epoch 30, batch 300, loss[loss=0.1943, ctc_loss=0.1278, cr_loss=0.3326, over 17290.00 frames. ], tot_loss[loss=0.203, ctc_loss=0.1329, cr_loss=0.3507, over 2603701.39 frames. ], batch size: 49, lr: 4.00e-03, grad_scale: 32.0 2024-09-24 15:11:07,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=528663.3333333334, ans=0.1 2024-09-24 15:11:08,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2024-09-24 15:11:19,460 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:11:48,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=15.0 2024-09-24 15:11:53,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=528756.6666666666, ans=0.125 2024-09-24 15:11:57,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-09-24 15:12:11,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=528850.0, ans=0.1 2024-09-24 15:12:28,506 INFO [train.py:1198] (3/4) Epoch 30, batch 350, loss[loss=0.2204, ctc_loss=0.1508, cr_loss=0.348, over 15407.00 frames. ], tot_loss[loss=0.2032, ctc_loss=0.133, cr_loss=0.3508, over 2761803.99 frames. ], batch size: 89, lr: 4.00e-03, grad_scale: 32.0 2024-09-24 15:12:40,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=22.5 2024-09-24 15:12:41,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=528896.6666666666, ans=0.125 2024-09-24 15:13:00,084 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.261e+02 1.365e+02 1.554e+02 1.989e+02, threshold=2.731e+02, percent-clipped=0.0 2024-09-24 15:13:14,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=528990.0, ans=0.07 2024-09-24 15:13:15,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=528990.0, ans=10.0 2024-09-24 15:13:18,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=529036.6666666666, ans=0.125 2024-09-24 15:13:19,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=529036.6666666666, ans=0.125 2024-09-24 15:13:22,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=529036.6666666666, ans=0.125 2024-09-24 15:13:32,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=529036.6666666666, ans=0.2 2024-09-24 15:13:51,537 INFO [train.py:1198] (3/4) Epoch 30, batch 400, loss[loss=0.1985, ctc_loss=0.1284, cr_loss=0.3509, over 17206.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.133, cr_loss=0.3499, over 2885447.18 frames. ], batch size: 47, lr: 4.00e-03, grad_scale: 32.0 2024-09-24 15:14:30,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=529223.3333333334, ans=0.125 2024-09-24 15:14:33,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=529223.3333333334, ans=0.125 2024-09-24 15:14:38,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529270.0, ans=0.1 2024-09-24 15:14:57,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=529316.6666666666, ans=0.125 2024-09-24 15:15:14,477 INFO [train.py:1198] (3/4) Epoch 30, batch 450, loss[loss=0.2098, ctc_loss=0.1382, cr_loss=0.358, over 17023.00 frames. ], tot_loss[loss=0.2022, ctc_loss=0.1324, cr_loss=0.3489, over 2987528.16 frames. ], batch size: 44, lr: 4.00e-03, grad_scale: 16.0 2024-09-24 15:15:21,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2024-09-24 15:15:23,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=529363.3333333334, ans=15.0 2024-09-24 15:15:28,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2024-09-24 15:15:33,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=22.5 2024-09-24 15:15:34,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=15.0 2024-09-24 15:15:44,835 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.260e+02 1.335e+02 1.450e+02 2.256e+02, threshold=2.670e+02, percent-clipped=0.0 2024-09-24 15:16:01,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=529503.3333333334, ans=0.0 2024-09-24 15:16:12,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=529503.3333333334, ans=0.0 2024-09-24 15:16:15,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=529503.3333333334, ans=0.125 2024-09-24 15:16:18,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=529550.0, ans=0.0 2024-09-24 15:16:34,282 INFO [train.py:1198] (3/4) Epoch 30, batch 500, loss[loss=0.1937, ctc_loss=0.1241, cr_loss=0.3481, over 16805.00 frames. ], tot_loss[loss=0.2014, ctc_loss=0.1317, cr_loss=0.3482, over 3073303.47 frames. ], batch size: 61, lr: 4.00e-03, grad_scale: 16.0 2024-09-24 15:17:04,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=529643.3333333334, ans=0.125 2024-09-24 15:17:08,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=529643.3333333334, ans=0.025 2024-09-24 15:17:11,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=529690.0, ans=0.125 2024-09-24 15:17:19,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=529690.0, ans=0.125 2024-09-24 15:17:58,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=529783.3333333334, ans=0.0 2024-09-24 15:18:02,962 INFO [train.py:1198] (3/4) Epoch 30, batch 550, loss[loss=0.211, ctc_loss=0.1386, cr_loss=0.3616, over 17307.00 frames. ], tot_loss[loss=0.201, ctc_loss=0.1314, cr_loss=0.348, over 3141515.05 frames. ], batch size: 51, lr: 4.00e-03, grad_scale: 16.0 2024-09-24 15:18:11,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=529830.0, ans=0.0 2024-09-24 15:18:14,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=529830.0, ans=0.0 2024-09-24 15:18:19,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=529876.6666666666, ans=0.1 2024-09-24 15:18:33,369 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.259e+02 1.342e+02 1.453e+02 2.055e+02, threshold=2.683e+02, percent-clipped=0.0 2024-09-24 15:18:40,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=529923.3333333334, ans=0.125 2024-09-24 15:18:51,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=529970.0, ans=0.125 2024-09-24 15:19:02,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529970.0, ans=0.1 2024-09-24 15:19:07,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=530016.6666666666, ans=0.0 2024-09-24 15:19:13,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=530016.6666666666, ans=0.025 2024-09-24 15:19:23,343 INFO [train.py:1198] (3/4) Epoch 30, batch 600, loss[loss=0.1621, ctc_loss=0.1017, cr_loss=0.302, over 16976.00 frames. ], tot_loss[loss=0.2018, ctc_loss=0.132, cr_loss=0.3489, over 3190932.93 frames. ], batch size: 42, lr: 4.00e-03, grad_scale: 16.0 2024-09-24 15:19:42,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=530110.0, ans=0.2 2024-09-24 15:19:42,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=530110.0, ans=0.025 2024-09-24 15:20:03,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=530156.6666666666, ans=0.1 2024-09-24 15:20:38,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=12.0 2024-09-24 15:20:45,718 INFO [train.py:1198] (3/4) Epoch 30, batch 650, loss[loss=0.2166, ctc_loss=0.145, cr_loss=0.3581, over 17040.00 frames. ], tot_loss[loss=0.1999, ctc_loss=0.1306, cr_loss=0.3466, over 3231428.16 frames. ], batch size: 52, lr: 4.00e-03, grad_scale: 16.0 2024-09-24 15:21:15,829 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.275e+02 1.371e+02 1.445e+02 2.518e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-24 15:22:10,439 INFO [train.py:1198] (3/4) Epoch 30, batch 700, loss[loss=0.2022, ctc_loss=0.1302, cr_loss=0.36, over 17208.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1313, cr_loss=0.3482, over 3261757.59 frames. ], batch size: 47, lr: 4.00e-03, grad_scale: 16.0 2024-09-24 15:22:13,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=530530.0, ans=0.2 2024-09-24 15:22:17,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2024-09-24 15:22:25,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=530576.6666666666, ans=0.125 2024-09-24 15:22:29,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=530576.6666666666, ans=0.0 2024-09-24 15:23:04,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=530670.0, ans=0.2 2024-09-24 15:23:09,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=530670.0, ans=0.2 2024-09-24 15:23:12,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=530670.0, ans=0.125 2024-09-24 15:23:25,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=530716.6666666666, ans=0.0 2024-09-24 15:23:30,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=530716.6666666666, ans=0.0 2024-09-24 15:23:33,172 INFO [train.py:1198] (3/4) Epoch 30, batch 750, loss[loss=0.2209, ctc_loss=0.1473, cr_loss=0.368, over 14790.00 frames. ], tot_loss[loss=0.2001, ctc_loss=0.1308, cr_loss=0.3465, over 3284420.88 frames. ], batch size: 88, lr: 4.00e-03, grad_scale: 16.0 2024-09-24 15:23:38,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=530763.3333333334, ans=0.125 2024-09-24 15:24:01,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.77 vs. limit=6.0 2024-09-24 15:24:04,038 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.243e+02 1.346e+02 1.462e+02 2.306e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-24 15:24:14,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=530856.6666666666, ans=0.125 2024-09-24 15:24:44,822 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2024-09-24 15:24:56,186 INFO [train.py:1198] (3/4) Epoch 30, batch 800, loss[loss=0.2307, ctc_loss=0.1597, cr_loss=0.3549, over 11735.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1298, cr_loss=0.3458, over 3307484.77 frames. ], batch size: 123, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:25:33,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=531090.0, ans=0.125 2024-09-24 15:25:39,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=531090.0, ans=0.1 2024-09-24 15:25:42,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=531136.6666666666, ans=0.0 2024-09-24 15:26:06,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=531183.3333333334, ans=0.1 2024-09-24 15:26:16,196 INFO [train.py:1198] (3/4) Epoch 30, batch 850, loss[loss=0.2052, ctc_loss=0.1358, cr_loss=0.3467, over 17317.00 frames. ], tot_loss[loss=0.2, ctc_loss=0.1306, cr_loss=0.347, over 3319676.35 frames. ], batch size: 51, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:26:53,457 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.016e+02 1.252e+02 1.339e+02 1.409e+02 2.350e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-24 15:27:41,875 INFO [train.py:1198] (3/4) Epoch 30, batch 900, loss[loss=0.2057, ctc_loss=0.1335, cr_loss=0.3609, over 17307.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.1309, cr_loss=0.348, over 3327658.76 frames. ], batch size: 49, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:27:45,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=531463.3333333334, ans=0.125 2024-09-24 15:27:48,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.98 vs. limit=10.0 2024-09-24 15:28:11,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2024-09-24 15:28:16,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=531556.6666666666, ans=0.0 2024-09-24 15:28:19,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=531556.6666666666, ans=0.0 2024-09-24 15:28:26,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.42 vs. limit=22.5 2024-09-24 15:28:36,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2024-09-24 15:29:04,216 INFO [train.py:1198] (3/4) Epoch 30, batch 950, loss[loss=0.1885, ctc_loss=0.1223, cr_loss=0.3312, over 17103.00 frames. ], tot_loss[loss=0.1996, ctc_loss=0.1302, cr_loss=0.3469, over 3338960.42 frames. ], batch size: 49, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:29:04,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=531696.6666666666, ans=0.125 2024-09-24 15:29:13,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=531696.6666666666, ans=0.0 2024-09-24 15:29:14,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2024-09-24 15:29:26,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=531743.3333333334, ans=0.125 2024-09-24 15:29:35,837 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.249e+02 1.333e+02 1.422e+02 1.714e+02, threshold=2.667e+02, percent-clipped=0.0 2024-09-24 15:29:45,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=531790.0, ans=0.125 2024-09-24 15:30:10,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=531883.3333333334, ans=0.0 2024-09-24 15:30:26,509 INFO [train.py:1198] (3/4) Epoch 30, batch 1000, loss[loss=0.2136, ctc_loss=0.1416, cr_loss=0.3599, over 17034.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.13, cr_loss=0.3467, over 3343771.74 frames. ], batch size: 52, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:30:30,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-24 15:30:36,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=531930.0, ans=0.0 2024-09-24 15:31:08,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=532023.3333333334, ans=0.125 2024-09-24 15:31:26,484 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:31:28,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2024-09-24 15:31:31,264 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=2.571e-03 2024-09-24 15:31:31,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=532116.6666666666, ans=0.125 2024-09-24 15:31:51,973 INFO [train.py:1198] (3/4) Epoch 30, batch 1050, loss[loss=0.1794, ctc_loss=0.1155, cr_loss=0.3195, over 17030.00 frames. ], tot_loss[loss=0.1987, ctc_loss=0.1296, cr_loss=0.3454, over 3349028.63 frames. ], batch size: 39, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:32:13,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=532210.0, ans=0.125 2024-09-24 15:32:22,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=532256.6666666666, ans=0.025 2024-09-24 15:32:23,922 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.266e+02 1.377e+02 1.519e+02 2.640e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-24 15:32:44,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=532303.3333333334, ans=0.025 2024-09-24 15:32:54,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=532350.0, ans=0.0 2024-09-24 15:33:09,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=532350.0, ans=0.0 2024-09-24 15:33:14,103 INFO [train.py:1198] (3/4) Epoch 30, batch 1100, loss[loss=0.1675, ctc_loss=0.1089, cr_loss=0.2927, over 17182.00 frames. ], tot_loss[loss=0.2002, ctc_loss=0.1307, cr_loss=0.3477, over 3347768.50 frames. ], batch size: 41, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:33:15,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=532396.6666666666, ans=0.0 2024-09-24 15:33:17,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=532396.6666666666, ans=0.2 2024-09-24 15:33:19,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=532396.6666666666, ans=0.2 2024-09-24 15:33:33,988 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.77 vs. limit=10.0 2024-09-24 15:33:52,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=532490.0, ans=0.09899494936611666 2024-09-24 15:33:52,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=532490.0, ans=0.0 2024-09-24 15:34:16,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=532583.3333333334, ans=15.0 2024-09-24 15:34:33,775 INFO [train.py:1198] (3/4) Epoch 30, batch 1150, loss[loss=0.1915, ctc_loss=0.1242, cr_loss=0.3367, over 17310.00 frames. ], tot_loss[loss=0.1991, ctc_loss=0.1299, cr_loss=0.346, over 3347190.59 frames. ], batch size: 51, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:34:40,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=532630.0, ans=0.125 2024-09-24 15:34:43,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=532630.0, ans=0.2 2024-09-24 15:34:43,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2024-09-24 15:34:46,742 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:35:09,706 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.299e+02 1.402e+02 1.532e+02 2.058e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-24 15:35:10,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=532723.3333333334, ans=0.125 2024-09-24 15:35:19,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=532723.3333333334, ans=0.0 2024-09-24 15:35:32,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=532770.0, ans=0.125 2024-09-24 15:35:35,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=532770.0, ans=0.2 2024-09-24 15:35:40,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=532816.6666666666, ans=0.05 2024-09-24 15:35:43,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=532816.6666666666, ans=0.95 2024-09-24 15:35:56,126 INFO [train.py:1198] (3/4) Epoch 30, batch 1200, loss[loss=0.2437, ctc_loss=0.1629, cr_loss=0.4039, over 17039.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.1301, cr_loss=0.3463, over 3351436.04 frames. ], batch size: 52, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:36:04,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=532863.3333333334, ans=0.125 2024-09-24 15:36:17,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=12.0 2024-09-24 15:36:28,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532956.6666666666, ans=0.1 2024-09-24 15:36:45,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=533003.3333333334, ans=0.125 2024-09-24 15:37:18,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=533050.0, ans=0.125 2024-09-24 15:37:21,571 INFO [train.py:1198] (3/4) Epoch 30, batch 1250, loss[loss=0.2094, ctc_loss=0.134, cr_loss=0.3769, over 17251.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.131, cr_loss=0.3478, over 3346407.23 frames. ], batch size: 44, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:37:55,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=533190.0, ans=0.5 2024-09-24 15:37:57,235 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.280e+02 1.378e+02 1.490e+02 1.846e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-24 15:37:59,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.07 vs. limit=15.0 2024-09-24 15:38:21,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=533236.6666666666, ans=0.2 2024-09-24 15:38:34,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=533283.3333333334, ans=0.09899494936611666 2024-09-24 15:38:43,380 INFO [train.py:1198] (3/4) Epoch 30, batch 1300, loss[loss=0.2138, ctc_loss=0.1399, cr_loss=0.3695, over 17006.00 frames. ], tot_loss[loss=0.2007, ctc_loss=0.1312, cr_loss=0.3479, over 3342456.51 frames. ], batch size: 44, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:39:12,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=533376.6666666666, ans=0.125 2024-09-24 15:39:59,759 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.10 vs. limit=10.0 2024-09-24 15:40:05,655 INFO [train.py:1198] (3/4) Epoch 30, batch 1350, loss[loss=0.2302, ctc_loss=0.1549, cr_loss=0.3763, over 16710.00 frames. ], tot_loss[loss=0.2019, ctc_loss=0.132, cr_loss=0.3491, over 3336417.56 frames. ], batch size: 61, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:40:10,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=533563.3333333334, ans=0.0 2024-09-24 15:40:13,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533563.3333333334, ans=0.1 2024-09-24 15:40:24,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=533610.0, ans=0.0 2024-09-24 15:40:38,859 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.284e+02 1.374e+02 1.506e+02 2.096e+02, threshold=2.749e+02, percent-clipped=0.0 2024-09-24 15:40:39,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=533656.6666666666, ans=0.125 2024-09-24 15:40:45,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=533656.6666666666, ans=10.0 2024-09-24 15:41:02,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2024-09-24 15:41:15,020 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:41:21,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=533750.0, ans=0.2 2024-09-24 15:41:26,036 INFO [train.py:1198] (3/4) Epoch 30, batch 1400, loss[loss=0.1987, ctc_loss=0.1281, cr_loss=0.353, over 17215.00 frames. ], tot_loss[loss=0.203, ctc_loss=0.1329, cr_loss=0.3506, over 3327811.25 frames. ], batch size: 55, lr: 3.98e-03, grad_scale: 16.0 2024-09-24 15:41:57,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=533843.3333333334, ans=0.125 2024-09-24 15:42:00,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=533843.3333333334, ans=0.2 2024-09-24 15:42:09,827 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.25 vs. limit=15.0 2024-09-24 15:42:20,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=533936.6666666666, ans=0.0 2024-09-24 15:42:23,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-09-24 15:42:25,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=533936.6666666666, ans=0.0 2024-09-24 15:42:25,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2024-09-24 15:42:27,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2024-09-24 15:42:54,636 INFO [train.py:1198] (3/4) Epoch 30, batch 1450, loss[loss=0.2067, ctc_loss=0.1338, cr_loss=0.3644, over 16999.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1315, cr_loss=0.3491, over 3345651.96 frames. ], batch size: 53, lr: 3.98e-03, grad_scale: 16.0 2024-09-24 15:43:01,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=534030.0, ans=0.025 2024-09-24 15:43:13,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=534076.6666666666, ans=0.2 2024-09-24 15:43:28,224 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.340e+02 1.457e+02 1.567e+02 2.615e+02, threshold=2.914e+02, percent-clipped=0.0 2024-09-24 15:44:05,411 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:44:07,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=534216.6666666666, ans=0.0 2024-09-24 15:44:14,827 INFO [train.py:1198] (3/4) Epoch 30, batch 1500, loss[loss=0.1961, ctc_loss=0.1254, cr_loss=0.3538, over 17299.00 frames. ], tot_loss[loss=0.2014, ctc_loss=0.1315, cr_loss=0.3495, over 3351116.37 frames. ], batch size: 46, lr: 3.98e-03, grad_scale: 16.0 2024-09-24 15:44:26,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=534263.3333333334, ans=0.1 2024-09-24 15:44:34,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=534310.0, ans=0.125 2024-09-24 15:44:34,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=534310.0, ans=0.125 2024-09-24 15:44:50,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.74 vs. limit=10.0 2024-09-24 15:45:21,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=534450.0, ans=0.125 2024-09-24 15:45:22,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=534450.0, ans=0.2 2024-09-24 15:45:27,895 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:45:37,370 INFO [train.py:1198] (3/4) Epoch 30, batch 1550, loss[loss=0.1961, ctc_loss=0.1281, cr_loss=0.3402, over 17017.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1314, cr_loss=0.3496, over 3354211.98 frames. ], batch size: 52, lr: 3.98e-03, grad_scale: 16.0 2024-09-24 15:45:39,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.27 vs. limit=15.0 2024-09-24 15:45:50,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=534496.6666666666, ans=0.0 2024-09-24 15:45:55,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=534543.3333333334, ans=0.125 2024-09-24 15:46:09,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=534590.0, ans=0.1 2024-09-24 15:46:10,267 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.45 vs. limit=15.0 2024-09-24 15:46:11,088 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.255e+02 1.331e+02 1.418e+02 1.761e+02, threshold=2.662e+02, percent-clipped=0.0 2024-09-24 15:46:29,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.67 vs. limit=10.0 2024-09-24 15:46:39,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=534636.6666666666, ans=0.04949747468305833 2024-09-24 15:46:54,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=534683.3333333334, ans=0.0 2024-09-24 15:47:02,579 INFO [train.py:1198] (3/4) Epoch 30, batch 1600, loss[loss=0.2191, ctc_loss=0.1483, cr_loss=0.3536, over 17296.00 frames. ], tot_loss[loss=0.201, ctc_loss=0.1312, cr_loss=0.3491, over 3360408.67 frames. ], batch size: 51, lr: 3.98e-03, grad_scale: 32.0 2024-09-24 15:47:06,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2024-09-24 15:47:23,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=534776.6666666666, ans=0.0 2024-09-24 15:47:40,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=534823.3333333334, ans=0.125 2024-09-24 15:48:25,596 INFO [train.py:1198] (3/4) Epoch 30, batch 1650, loss[loss=0.1677, ctc_loss=0.1068, cr_loss=0.3048, over 17046.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1307, cr_loss=0.3484, over 3362594.67 frames. ], batch size: 39, lr: 3.98e-03, grad_scale: 32.0 2024-09-24 15:48:35,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=534963.3333333334, ans=0.125 2024-09-24 15:48:40,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.65 vs. limit=15.0 2024-09-24 15:48:59,210 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.272e+02 1.342e+02 1.418e+02 1.836e+02, threshold=2.683e+02, percent-clipped=0.0 2024-09-24 15:49:14,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.84 vs. limit=10.0 2024-09-24 15:49:27,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=535150.0, ans=0.125 2024-09-24 15:49:45,318 INFO [train.py:1198] (3/4) Epoch 30, batch 1700, loss[loss=0.1944, ctc_loss=0.1292, cr_loss=0.326, over 17016.00 frames. ], tot_loss[loss=0.2005, ctc_loss=0.1308, cr_loss=0.3482, over 3361491.31 frames. ], batch size: 51, lr: 3.98e-03, grad_scale: 32.0 2024-09-24 15:50:28,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=535290.0, ans=0.07 2024-09-24 15:50:46,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=535336.6666666666, ans=0.0 2024-09-24 15:50:58,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=535383.3333333334, ans=0.0 2024-09-24 15:51:08,607 INFO [train.py:1198] (3/4) Epoch 30, batch 1750, loss[loss=0.2184, ctc_loss=0.1389, cr_loss=0.3975, over 17214.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1306, cr_loss=0.3486, over 3359608.14 frames. ], batch size: 50, lr: 3.98e-03, grad_scale: 32.0 2024-09-24 15:51:47,339 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.310e+02 1.375e+02 1.459e+02 2.196e+02, threshold=2.749e+02, percent-clipped=0.0 2024-09-24 15:51:49,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=535523.3333333334, ans=0.0 2024-09-24 15:52:25,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=535616.6666666666, ans=0.0 2024-09-24 15:52:33,549 INFO [train.py:1198] (3/4) Epoch 30, batch 1800, loss[loss=0.2078, ctc_loss=0.1327, cr_loss=0.3758, over 17293.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1302, cr_loss=0.3476, over 3369748.32 frames. ], batch size: 49, lr: 3.98e-03, grad_scale: 32.0 2024-09-24 15:52:35,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535663.3333333334, ans=0.1 2024-09-24 15:52:38,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=535663.3333333334, ans=0.1 2024-09-24 15:52:39,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.63 vs. limit=6.0 2024-09-24 15:52:49,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535663.3333333334, ans=0.1 2024-09-24 15:53:02,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=535710.0, ans=0.2 2024-09-24 15:53:56,385 INFO [train.py:1198] (3/4) Epoch 30, batch 1850, loss[loss=0.1724, ctc_loss=0.1096, cr_loss=0.3141, over 17080.00 frames. ], tot_loss[loss=0.2005, ctc_loss=0.1307, cr_loss=0.3487, over 3361222.58 frames. ], batch size: 43, lr: 3.98e-03, grad_scale: 32.0 2024-09-24 15:54:17,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=535943.3333333334, ans=0.125 2024-09-24 15:54:19,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=535943.3333333334, ans=0.125 2024-09-24 15:54:30,248 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.250e+02 1.313e+02 1.396e+02 1.985e+02, threshold=2.627e+02, percent-clipped=0.0 2024-09-24 15:54:47,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=536036.6666666666, ans=0.125 2024-09-24 15:54:52,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=536036.6666666666, ans=0.0 2024-09-24 15:54:55,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.34 vs. limit=22.5 2024-09-24 15:55:03,548 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:55:11,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=536083.3333333334, ans=0.125 2024-09-24 15:55:13,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=536083.3333333334, ans=0.125 2024-09-24 15:55:18,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=536130.0, ans=0.125 2024-09-24 15:55:19,300 INFO [train.py:1198] (3/4) Epoch 30, batch 1900, loss[loss=0.1617, ctc_loss=0.102, cr_loss=0.2988, over 17104.00 frames. ], tot_loss[loss=0.2001, ctc_loss=0.1306, cr_loss=0.3475, over 3353607.63 frames. ], batch size: 40, lr: 3.98e-03, grad_scale: 32.0 2024-09-24 15:55:29,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536130.0, ans=0.1 2024-09-24 15:55:52,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.71 vs. limit=10.0 2024-09-24 15:55:58,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=536223.3333333334, ans=0.125 2024-09-24 15:56:14,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=536270.0, ans=0.0 2024-09-24 15:56:15,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=536270.0, ans=0.125 2024-09-24 15:56:43,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=536363.3333333334, ans=0.125 2024-09-24 15:56:44,475 INFO [train.py:1198] (3/4) Epoch 30, batch 1950, loss[loss=0.1939, ctc_loss=0.1255, cr_loss=0.3421, over 17307.00 frames. ], tot_loss[loss=0.1987, ctc_loss=0.1297, cr_loss=0.3451, over 3349436.64 frames. ], batch size: 51, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 15:56:49,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=536363.3333333334, ans=0.04949747468305833 2024-09-24 15:56:51,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=536363.3333333334, ans=0.125 2024-09-24 15:56:57,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=536363.3333333334, ans=0.125 2024-09-24 15:57:05,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=536410.0, ans=0.025 2024-09-24 15:57:19,472 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.270e+02 1.356e+02 1.450e+02 2.095e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-24 15:57:29,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=536456.6666666666, ans=0.125 2024-09-24 15:57:51,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=536550.0, ans=0.125 2024-09-24 15:57:51,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=15.0 2024-09-24 15:58:02,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=536550.0, ans=0.05 2024-09-24 15:58:06,789 INFO [train.py:1198] (3/4) Epoch 30, batch 2000, loss[loss=0.2281, ctc_loss=0.1478, cr_loss=0.4019, over 16887.00 frames. ], tot_loss[loss=0.1995, ctc_loss=0.1302, cr_loss=0.3465, over 3344274.73 frames. ], batch size: 58, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 15:58:47,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2024-09-24 15:58:55,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=536736.6666666666, ans=0.025 2024-09-24 15:59:26,922 INFO [train.py:1198] (3/4) Epoch 30, batch 2050, loss[loss=0.2184, ctc_loss=0.1456, cr_loss=0.364, over 17008.00 frames. ], tot_loss[loss=0.2005, ctc_loss=0.1309, cr_loss=0.3478, over 3347050.17 frames. ], batch size: 44, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 15:59:27,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=536830.0, ans=0.125 2024-09-24 15:59:29,069 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:59:35,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.03 vs. limit=10.0 2024-09-24 15:59:36,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=536830.0, ans=0.1 2024-09-24 15:59:44,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=536876.6666666666, ans=0.125 2024-09-24 15:59:50,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=536876.6666666666, ans=0.125 2024-09-24 15:59:53,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=536876.6666666666, ans=0.2 2024-09-24 16:00:04,534 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.272e+02 1.366e+02 1.463e+02 2.391e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-24 16:00:05,083 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-09-24 16:00:24,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=22.5 2024-09-24 16:00:37,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=12.0 2024-09-24 16:00:49,750 INFO [train.py:1198] (3/4) Epoch 30, batch 2100, loss[loss=0.2187, ctc_loss=0.144, cr_loss=0.3738, over 16912.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.1309, cr_loss=0.3488, over 3354961.64 frames. ], batch size: 58, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:01:24,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=537156.6666666666, ans=0.04949747468305833 2024-09-24 16:01:30,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=537156.6666666666, ans=0.1 2024-09-24 16:02:04,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=537250.0, ans=0.1 2024-09-24 16:02:14,605 INFO [train.py:1198] (3/4) Epoch 30, batch 2150, loss[loss=0.1547, ctc_loss=0.09726, cr_loss=0.2872, over 16972.00 frames. ], tot_loss[loss=0.2001, ctc_loss=0.1306, cr_loss=0.3478, over 3353919.11 frames. ], batch size: 42, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:02:27,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=537296.6666666666, ans=0.125 2024-09-24 16:02:39,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=537343.3333333334, ans=0.125 2024-09-24 16:02:52,196 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.245e+02 1.327e+02 1.449e+02 2.310e+02, threshold=2.654e+02, percent-clipped=0.0 2024-09-24 16:03:09,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=537436.6666666666, ans=0.035 2024-09-24 16:03:13,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=537436.6666666666, ans=0.04949747468305833 2024-09-24 16:03:36,775 INFO [train.py:1198] (3/4) Epoch 30, batch 2200, loss[loss=0.2013, ctc_loss=0.1325, cr_loss=0.3445, over 16782.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1307, cr_loss=0.3483, over 3354768.37 frames. ], batch size: 61, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:04:02,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=537576.6666666666, ans=0.0 2024-09-24 16:04:36,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=537670.0, ans=0.0 2024-09-24 16:04:49,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=537716.6666666666, ans=0.125 2024-09-24 16:05:00,240 INFO [train.py:1198] (3/4) Epoch 30, batch 2250, loss[loss=0.1841, ctc_loss=0.1177, cr_loss=0.3315, over 17192.00 frames. ], tot_loss[loss=0.1996, ctc_loss=0.13, cr_loss=0.3479, over 3364702.85 frames. ], batch size: 41, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:05:05,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=22.5 2024-09-24 16:05:13,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=537763.3333333334, ans=0.2 2024-09-24 16:05:35,468 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.295e+02 1.394e+02 1.566e+02 2.386e+02, threshold=2.787e+02, percent-clipped=0.0 2024-09-24 16:05:35,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=537856.6666666666, ans=0.0 2024-09-24 16:05:47,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.23 vs. limit=6.0 2024-09-24 16:05:50,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=537903.3333333334, ans=10.0 2024-09-24 16:05:50,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=537903.3333333334, ans=0.2 2024-09-24 16:05:51,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=537903.3333333334, ans=0.2 2024-09-24 16:06:20,403 INFO [train.py:1198] (3/4) Epoch 30, batch 2300, loss[loss=0.1937, ctc_loss=0.1263, cr_loss=0.3367, over 17164.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1296, cr_loss=0.3464, over 3360260.38 frames. ], batch size: 45, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:06:43,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=538043.3333333334, ans=0.125 2024-09-24 16:06:52,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=538043.3333333334, ans=0.2 2024-09-24 16:07:44,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=538183.3333333334, ans=0.1 2024-09-24 16:07:47,756 INFO [train.py:1198] (3/4) Epoch 30, batch 2350, loss[loss=0.194, ctc_loss=0.1251, cr_loss=0.3443, over 17029.00 frames. ], tot_loss[loss=0.1995, ctc_loss=0.13, cr_loss=0.3473, over 3358675.49 frames. ], batch size: 52, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:07:54,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=538230.0, ans=0.125 2024-09-24 16:08:15,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=538276.6666666666, ans=0.0 2024-09-24 16:08:15,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=538276.6666666666, ans=0.025 2024-09-24 16:08:22,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-09-24 16:08:23,164 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.275e+02 1.344e+02 1.471e+02 2.396e+02, threshold=2.688e+02, percent-clipped=0.0 2024-09-24 16:08:28,444 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.86 vs. limit=10.0 2024-09-24 16:08:58,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=538416.6666666666, ans=0.125 2024-09-24 16:09:05,761 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2024-09-24 16:09:06,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.48 vs. limit=15.0 2024-09-24 16:09:08,044 INFO [train.py:1198] (3/4) Epoch 30, batch 2400, loss[loss=0.2463, ctc_loss=0.1612, cr_loss=0.4252, over 17010.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1306, cr_loss=0.3491, over 3358870.55 frames. ], batch size: 53, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:09:09,057 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=15.0 2024-09-24 16:09:46,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=538556.6666666666, ans=0.0 2024-09-24 16:09:47,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=538556.6666666666, ans=0.2 2024-09-24 16:09:52,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=538556.6666666666, ans=0.125 2024-09-24 16:10:18,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-24 16:10:30,238 INFO [train.py:1198] (3/4) Epoch 30, batch 2450, loss[loss=0.2034, ctc_loss=0.1355, cr_loss=0.3395, over 17069.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1314, cr_loss=0.3496, over 3352463.72 frames. ], batch size: 46, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:10:46,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=538743.3333333334, ans=0.025 2024-09-24 16:11:05,480 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.006e+02 1.261e+02 1.331e+02 1.438e+02 1.772e+02, threshold=2.662e+02, percent-clipped=0.0 2024-09-24 16:11:42,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=538883.3333333334, ans=0.125 2024-09-24 16:11:49,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=538883.3333333334, ans=0.2 2024-09-24 16:11:55,428 INFO [train.py:1198] (3/4) Epoch 30, batch 2500, loss[loss=0.1761, ctc_loss=0.113, cr_loss=0.3157, over 17303.00 frames. ], tot_loss[loss=0.1996, ctc_loss=0.1302, cr_loss=0.3469, over 3359445.75 frames. ], batch size: 49, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:12:10,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=538976.6666666666, ans=0.95 2024-09-24 16:13:17,974 INFO [train.py:1198] (3/4) Epoch 30, batch 2550, loss[loss=0.1824, ctc_loss=0.119, cr_loss=0.3174, over 17211.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.1311, cr_loss=0.3485, over 3350197.90 frames. ], batch size: 50, lr: 3.96e-03, grad_scale: 32.0 2024-09-24 16:13:18,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=539163.3333333334, ans=0.2 2024-09-24 16:13:19,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=539163.3333333334, ans=0.2 2024-09-24 16:13:23,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=539163.3333333334, ans=0.0 2024-09-24 16:13:23,128 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 16:13:36,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=539210.0, ans=0.0 2024-09-24 16:13:43,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=539210.0, ans=0.025 2024-09-24 16:13:45,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=539210.0, ans=0.05 2024-09-24 16:13:53,195 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.247e+02 1.346e+02 1.482e+02 2.162e+02, threshold=2.691e+02, percent-clipped=0.0 2024-09-24 16:14:28,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=539350.0, ans=0.125 2024-09-24 16:14:33,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=539350.0, ans=0.0 2024-09-24 16:14:40,189 INFO [train.py:1198] (3/4) Epoch 30, batch 2600, loss[loss=0.168, ctc_loss=0.1059, cr_loss=0.3108, over 17074.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1304, cr_loss=0.3472, over 3353189.52 frames. ], batch size: 43, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:14:42,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=539396.6666666666, ans=0.0 2024-09-24 16:14:55,355 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.17 vs. limit=10.0 2024-09-24 16:15:31,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.31 vs. limit=15.0 2024-09-24 16:15:34,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=539536.6666666666, ans=0.1 2024-09-24 16:15:41,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=539536.6666666666, ans=0.125 2024-09-24 16:15:46,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=539583.3333333334, ans=0.125 2024-09-24 16:15:54,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=539583.3333333334, ans=0.125 2024-09-24 16:15:55,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=539583.3333333334, ans=0.2 2024-09-24 16:16:00,291 INFO [train.py:1198] (3/4) Epoch 30, batch 2650, loss[loss=0.2014, ctc_loss=0.1317, cr_loss=0.3484, over 17221.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1304, cr_loss=0.347, over 3359493.69 frames. ], batch size: 47, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:16:19,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=539676.6666666666, ans=0.0 2024-09-24 16:16:24,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=539676.6666666666, ans=0.1 2024-09-24 16:16:29,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=539676.6666666666, ans=0.0 2024-09-24 16:16:29,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=15.0 2024-09-24 16:16:40,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=12.0 2024-09-24 16:16:42,934 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.284e+02 1.354e+02 1.501e+02 2.171e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-24 16:16:51,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=539723.3333333334, ans=0.1 2024-09-24 16:16:52,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=539770.0, ans=0.0 2024-09-24 16:17:09,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.69 vs. limit=22.5 2024-09-24 16:17:10,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=539816.6666666666, ans=0.125 2024-09-24 16:17:28,545 INFO [train.py:1198] (3/4) Epoch 30, batch 2700, loss[loss=0.2512, ctc_loss=0.1672, cr_loss=0.4198, over 16909.00 frames. ], tot_loss[loss=0.1996, ctc_loss=0.1301, cr_loss=0.3474, over 3368128.39 frames. ], batch size: 58, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:17:52,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=539910.0, ans=0.125 2024-09-24 16:18:43,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=540050.0, ans=0.0 2024-09-24 16:18:47,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2024-09-24 16:18:48,021 INFO [train.py:1198] (3/4) Epoch 30, batch 2750, loss[loss=0.1826, ctc_loss=0.1181, cr_loss=0.3223, over 17033.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1297, cr_loss=0.3465, over 3358968.85 frames. ], batch size: 39, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:19:05,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=540143.3333333334, ans=0.1 2024-09-24 16:19:05,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=540143.3333333334, ans=0.125 2024-09-24 16:19:10,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-09-24 16:19:20,483 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2024-09-24 16:19:26,139 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.254e+02 1.332e+02 1.454e+02 2.287e+02, threshold=2.664e+02, percent-clipped=0.0 2024-09-24 16:19:36,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=540236.6666666666, ans=0.0 2024-09-24 16:19:53,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=540283.3333333334, ans=0.2 2024-09-24 16:19:54,550 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.87 vs. limit=10.0 2024-09-24 16:19:56,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=540283.3333333334, ans=0.125 2024-09-24 16:20:10,882 INFO [train.py:1198] (3/4) Epoch 30, batch 2800, loss[loss=0.1613, ctc_loss=0.1046, cr_loss=0.2836, over 17083.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1297, cr_loss=0.3462, over 3348448.01 frames. ], batch size: 43, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:20:14,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=540330.0, ans=0.1 2024-09-24 16:20:16,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=22.5 2024-09-24 16:20:22,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=540330.0, ans=0.035 2024-09-24 16:20:22,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=540330.0, ans=0.2 2024-09-24 16:21:01,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=540470.0, ans=0.04949747468305833 2024-09-24 16:21:06,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=540470.0, ans=0.125 2024-09-24 16:21:21,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=540516.6666666666, ans=0.125 2024-09-24 16:21:25,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=22.5 2024-09-24 16:21:35,612 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 16:21:36,741 INFO [train.py:1198] (3/4) Epoch 30, batch 2850, loss[loss=0.2368, ctc_loss=0.1665, cr_loss=0.3512, over 11763.00 frames. ], tot_loss[loss=0.1999, ctc_loss=0.1305, cr_loss=0.3471, over 3337363.02 frames. ], batch size: 123, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:22:11,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=540656.6666666666, ans=0.07 2024-09-24 16:22:12,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=540656.6666666666, ans=0.125 2024-09-24 16:22:15,506 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.298e+02 1.387e+02 1.478e+02 2.436e+02, threshold=2.774e+02, percent-clipped=0.0 2024-09-24 16:22:16,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=15.0 2024-09-24 16:22:24,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=540656.6666666666, ans=0.0 2024-09-24 16:22:24,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=540656.6666666666, ans=0.0 2024-09-24 16:22:29,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=540703.3333333334, ans=0.125 2024-09-24 16:22:54,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.53 vs. limit=15.0 2024-09-24 16:23:00,160 INFO [train.py:1198] (3/4) Epoch 30, batch 2900, loss[loss=0.1792, ctc_loss=0.1139, cr_loss=0.3263, over 17091.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.1308, cr_loss=0.3479, over 3336821.92 frames. ], batch size: 43, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:23:03,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=540796.6666666666, ans=0.0 2024-09-24 16:23:05,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.75 vs. limit=15.0 2024-09-24 16:23:06,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=540796.6666666666, ans=0.125 2024-09-24 16:23:21,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=540843.3333333334, ans=0.04949747468305833 2024-09-24 16:23:50,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=540936.6666666666, ans=0.0 2024-09-24 16:24:15,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=540983.3333333334, ans=0.0 2024-09-24 16:24:20,585 INFO [train.py:1198] (3/4) Epoch 30, batch 2950, loss[loss=0.1698, ctc_loss=0.1071, cr_loss=0.3136, over 17255.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.131, cr_loss=0.3493, over 3340744.68 frames. ], batch size: 44, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:25:01,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.30 vs. limit=15.0 2024-09-24 16:25:01,512 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.276e+02 1.368e+02 1.488e+02 1.810e+02, threshold=2.735e+02, percent-clipped=0.0 2024-09-24 16:25:08,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=541123.3333333334, ans=0.125 2024-09-24 16:25:31,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=541216.6666666666, ans=0.0 2024-09-24 16:25:41,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-09-24 16:25:42,515 INFO [train.py:1198] (3/4) Epoch 30, batch 3000, loss[loss=0.1516, ctc_loss=0.09527, cr_loss=0.2817, over 17197.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.1306, cr_loss=0.3489, over 3349926.97 frames. ], batch size: 41, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:25:42,515 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 16:25:57,422 INFO [train.py:1230] (3/4) Epoch 30, validation: loss=0.03649, ctc_loss=0.03649, cr_loss=8.522e-15, over 944034.00 frames. 2024-09-24 16:25:57,422 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 16:26:03,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=541263.3333333334, ans=0.2 2024-09-24 16:26:08,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541263.3333333334, ans=0.1 2024-09-24 16:26:20,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=541310.0, ans=0.0 2024-09-24 16:26:20,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=541310.0, ans=0.0 2024-09-24 16:26:26,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=15.0 2024-09-24 16:26:51,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=541403.3333333334, ans=0.0 2024-09-24 16:26:52,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=541403.3333333334, ans=0.04949747468305833 2024-09-24 16:26:54,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=541403.3333333334, ans=0.0 2024-09-24 16:26:59,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=541403.3333333334, ans=0.2 2024-09-24 16:27:00,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=541403.3333333334, ans=10.0 2024-09-24 16:27:23,568 INFO [train.py:1198] (3/4) Epoch 30, batch 3050, loss[loss=0.236, ctc_loss=0.1556, cr_loss=0.4019, over 16932.00 frames. ], tot_loss[loss=0.1995, ctc_loss=0.1299, cr_loss=0.3478, over 3356154.24 frames. ], batch size: 58, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:28:00,734 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.275e+02 1.345e+02 1.423e+02 1.790e+02, threshold=2.690e+02, percent-clipped=0.0 2024-09-24 16:28:02,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=541590.0, ans=0.0 2024-09-24 16:28:25,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=541683.3333333334, ans=0.04949747468305833 2024-09-24 16:28:25,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=541683.3333333334, ans=0.025 2024-09-24 16:28:41,095 INFO [train.py:1198] (3/4) Epoch 30, batch 3100, loss[loss=0.2041, ctc_loss=0.1359, cr_loss=0.3412, over 17304.00 frames. ], tot_loss[loss=0.2001, ctc_loss=0.1304, cr_loss=0.3483, over 3358857.56 frames. ], batch size: 46, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:28:53,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.62 vs. limit=8.0 2024-09-24 16:29:39,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.66 vs. limit=15.0 2024-09-24 16:29:54,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=541916.6666666666, ans=0.2 2024-09-24 16:30:01,867 INFO [train.py:1198] (3/4) Epoch 30, batch 3150, loss[loss=0.1939, ctc_loss=0.127, cr_loss=0.3343, over 17303.00 frames. ], tot_loss[loss=0.1999, ctc_loss=0.1304, cr_loss=0.3476, over 3353283.78 frames. ], batch size: 46, lr: 3.95e-03, grad_scale: 16.0 2024-09-24 16:30:06,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=541963.3333333334, ans=0.0 2024-09-24 16:30:06,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541963.3333333334, ans=0.1 2024-09-24 16:30:10,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=541963.3333333334, ans=0.025 2024-09-24 16:30:14,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=541963.3333333334, ans=0.0 2024-09-24 16:30:22,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=542010.0, ans=0.1 2024-09-24 16:30:23,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=542010.0, ans=0.125 2024-09-24 16:30:39,238 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.276e+02 1.352e+02 1.455e+02 1.845e+02, threshold=2.704e+02, percent-clipped=0.0 2024-09-24 16:30:44,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=542056.6666666666, ans=0.125 2024-09-24 16:31:05,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=542150.0, ans=0.125 2024-09-24 16:31:20,117 INFO [train.py:1198] (3/4) Epoch 30, batch 3200, loss[loss=0.1877, ctc_loss=0.1221, cr_loss=0.328, over 17014.00 frames. ], tot_loss[loss=0.1988, ctc_loss=0.1296, cr_loss=0.346, over 3355717.16 frames. ], batch size: 44, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:31:42,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=542243.3333333334, ans=0.125 2024-09-24 16:31:56,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=542290.0, ans=0.05 2024-09-24 16:32:04,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=542290.0, ans=0.125 2024-09-24 16:32:08,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=542336.6666666666, ans=0.0 2024-09-24 16:32:10,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=542336.6666666666, ans=0.125 2024-09-24 16:32:18,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=542336.6666666666, ans=0.0 2024-09-24 16:32:25,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.03 vs. limit=10.0 2024-09-24 16:32:38,560 INFO [train.py:1198] (3/4) Epoch 30, batch 3250, loss[loss=0.2102, ctc_loss=0.1384, cr_loss=0.3592, over 17020.00 frames. ], tot_loss[loss=0.1991, ctc_loss=0.1297, cr_loss=0.3468, over 3367076.37 frames. ], batch size: 51, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:32:38,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=542430.0, ans=0.125 2024-09-24 16:32:58,247 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=15.0 2024-09-24 16:33:06,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.96 vs. limit=10.0 2024-09-24 16:33:16,568 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.257e+02 1.346e+02 1.457e+02 3.617e+02, threshold=2.692e+02, percent-clipped=1.0 2024-09-24 16:33:22,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=542523.3333333334, ans=0.07 2024-09-24 16:33:58,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=542663.3333333334, ans=0.125 2024-09-24 16:33:59,507 INFO [train.py:1198] (3/4) Epoch 30, batch 3300, loss[loss=0.1625, ctc_loss=0.1045, cr_loss=0.2897, over 16966.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1295, cr_loss=0.3469, over 3370296.86 frames. ], batch size: 42, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:34:04,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=542663.3333333334, ans=0.125 2024-09-24 16:34:48,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=542803.3333333334, ans=0.125 2024-09-24 16:34:50,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2024-09-24 16:35:14,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=542850.0, ans=0.1 2024-09-24 16:35:17,844 INFO [train.py:1198] (3/4) Epoch 30, batch 3350, loss[loss=0.1907, ctc_loss=0.1235, cr_loss=0.3362, over 16985.00 frames. ], tot_loss[loss=0.1988, ctc_loss=0.1296, cr_loss=0.3464, over 3365487.29 frames. ], batch size: 53, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:35:32,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=542943.3333333334, ans=0.0 2024-09-24 16:35:36,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=542943.3333333334, ans=0.0 2024-09-24 16:35:54,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=542990.0, ans=0.125 2024-09-24 16:35:55,308 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.300e+02 1.382e+02 1.466e+02 2.312e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-24 16:36:34,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=543130.0, ans=0.125 2024-09-24 16:36:36,356 INFO [train.py:1198] (3/4) Epoch 30, batch 3400, loss[loss=0.2117, ctc_loss=0.14, cr_loss=0.3581, over 17156.00 frames. ], tot_loss[loss=0.1984, ctc_loss=0.1292, cr_loss=0.346, over 3373339.95 frames. ], batch size: 45, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:36:38,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=12.0 2024-09-24 16:36:43,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=543130.0, ans=15.0 2024-09-24 16:36:52,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=543176.6666666666, ans=0.125 2024-09-24 16:36:53,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=543176.6666666666, ans=0.0 2024-09-24 16:37:25,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=543270.0, ans=0.125 2024-09-24 16:37:52,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=543316.6666666666, ans=0.025 2024-09-24 16:37:56,613 INFO [train.py:1198] (3/4) Epoch 30, batch 3450, loss[loss=0.2038, ctc_loss=0.1328, cr_loss=0.355, over 17311.00 frames. ], tot_loss[loss=0.1995, ctc_loss=0.13, cr_loss=0.3471, over 3369725.45 frames. ], batch size: 51, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:37:56,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=543363.3333333334, ans=0.02 2024-09-24 16:38:22,763 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=6.0 2024-09-24 16:38:29,375 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-09-24 16:38:35,841 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.271e+02 1.366e+02 1.468e+02 2.089e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-24 16:38:51,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=543503.3333333334, ans=0.1 2024-09-24 16:39:09,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=543550.0, ans=0.125 2024-09-24 16:39:16,720 INFO [train.py:1198] (3/4) Epoch 30, batch 3500, loss[loss=0.191, ctc_loss=0.123, cr_loss=0.3398, over 17017.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1302, cr_loss=0.3481, over 3370412.08 frames. ], batch size: 44, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:40:07,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=543736.6666666666, ans=0.1 2024-09-24 16:40:16,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=543736.6666666666, ans=0.0 2024-09-24 16:40:30,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=543783.3333333334, ans=0.2 2024-09-24 16:40:36,962 INFO [train.py:1198] (3/4) Epoch 30, batch 3550, loss[loss=0.1528, ctc_loss=0.09597, cr_loss=0.2842, over 17179.00 frames. ], tot_loss[loss=0.2002, ctc_loss=0.1305, cr_loss=0.3484, over 3354682.65 frames. ], batch size: 41, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:40:38,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=543830.0, ans=0.0 2024-09-24 16:40:41,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=543830.0, ans=0.0 2024-09-24 16:41:14,204 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.268e+02 1.349e+02 1.434e+02 2.153e+02, threshold=2.697e+02, percent-clipped=0.0 2024-09-24 16:41:27,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=22.5 2024-09-24 16:41:55,341 INFO [train.py:1198] (3/4) Epoch 30, batch 3600, loss[loss=0.2435, ctc_loss=0.1609, cr_loss=0.4131, over 17197.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.1309, cr_loss=0.3491, over 3350487.46 frames. ], batch size: 55, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:42:17,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=544110.0, ans=0.2 2024-09-24 16:42:19,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=544110.0, ans=0.125 2024-09-24 16:42:28,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=544156.6666666666, ans=0.125 2024-09-24 16:42:28,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=544156.6666666666, ans=0.0 2024-09-24 16:43:13,174 INFO [train.py:1198] (3/4) Epoch 30, batch 3650, loss[loss=0.2275, ctc_loss=0.1508, cr_loss=0.3834, over 17213.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.131, cr_loss=0.349, over 3362951.88 frames. ], batch size: 55, lr: 3.95e-03, grad_scale: 16.0 2024-09-24 16:43:16,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=544296.6666666666, ans=0.125 2024-09-24 16:43:29,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=544343.3333333334, ans=0.5 2024-09-24 16:43:44,040 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2024-09-24 16:43:54,825 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.277e+02 1.338e+02 1.462e+02 2.145e+02, threshold=2.677e+02, percent-clipped=0.0 2024-09-24 16:44:15,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=544436.6666666666, ans=0.07 2024-09-24 16:44:25,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=544483.3333333334, ans=0.0 2024-09-24 16:44:34,287 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-09-24 16:44:35,116 INFO [train.py:1198] (3/4) Epoch 30, batch 3700, loss[loss=0.2614, ctc_loss=0.1804, cr_loss=0.4046, over 11591.00 frames. ], tot_loss[loss=0.2, ctc_loss=0.1304, cr_loss=0.3482, over 3362891.46 frames. ], batch size: 123, lr: 3.95e-03, grad_scale: 8.0 2024-09-24 16:44:37,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=22.5 2024-09-24 16:44:43,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=544530.0, ans=0.0 2024-09-24 16:44:43,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-09-24 16:45:00,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=544576.6666666666, ans=0.1 2024-09-24 16:45:53,703 INFO [train.py:1198] (3/4) Epoch 30, batch 3750, loss[loss=0.195, ctc_loss=0.1309, cr_loss=0.3206, over 16897.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1308, cr_loss=0.348, over 3334196.59 frames. ], batch size: 58, lr: 3.94e-03, grad_scale: 8.0 2024-09-24 16:46:35,276 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.265e+02 1.357e+02 1.446e+02 2.507e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-24 16:46:43,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=544903.3333333334, ans=0.0 2024-09-24 16:47:12,373 INFO [train.py:1198] (3/4) Epoch 30, batch 3800, loss[loss=0.1578, ctc_loss=0.1021, cr_loss=0.2783, over 17168.00 frames. ], tot_loss[loss=0.2002, ctc_loss=0.1307, cr_loss=0.3473, over 3336558.23 frames. ], batch size: 45, lr: 3.94e-03, grad_scale: 8.0 2024-09-24 16:47:16,420 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=12.0 2024-09-24 16:47:22,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=544996.6666666666, ans=0.0 2024-09-24 16:47:25,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=544996.6666666666, ans=0.125 2024-09-24 16:47:45,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=545090.0, ans=0.125 2024-09-24 16:47:53,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-24 16:48:03,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=545136.6666666666, ans=0.0 2024-09-24 16:48:20,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=545183.3333333334, ans=0.0 2024-09-24 16:48:31,295 INFO [train.py:1198] (3/4) Epoch 30, batch 3850, loss[loss=0.2267, ctc_loss=0.1554, cr_loss=0.3568, over 15263.00 frames. ], tot_loss[loss=0.2042, ctc_loss=0.1339, cr_loss=0.3516, over 3289343.83 frames. ], batch size: 89, lr: 3.94e-03, grad_scale: 8.0 2024-09-24 16:48:39,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=545230.0, ans=0.125 2024-09-24 16:48:58,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2024-09-24 16:49:01,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=545323.3333333334, ans=0.09899494936611666 2024-09-24 16:49:05,950 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2024-09-24 16:49:11,169 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.380e+02 1.551e+02 1.671e+02 2.623e+02, threshold=3.102e+02, percent-clipped=0.0 2024-09-24 16:49:15,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=545370.0, ans=0.0 2024-09-24 16:50:31,770 INFO [train.py:1198] (3/4) Epoch 31, batch 0, loss[loss=0.1966, ctc_loss=0.1279, cr_loss=0.3435, over 17294.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1279, cr_loss=0.3435, over 17294.00 frames. ], batch size: 46, lr: 3.88e-03, grad_scale: 16.0 2024-09-24 16:50:31,770 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 16:50:43,944 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1148, 4.9235, 4.5439, 4.7164], device='cuda:3') 2024-09-24 16:50:45,094 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.7426, 3.3722, 4.4904, 4.3113], device='cuda:3') 2024-09-24 16:50:47,103 INFO [train.py:1230] (3/4) Epoch 31, validation: loss=0.03594, ctc_loss=0.03594, cr_loss=9.065e-15, over 944034.00 frames. 2024-09-24 16:50:47,104 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 16:50:50,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=545444.6666666666, ans=0.2 2024-09-24 16:51:33,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2024-09-24 16:51:45,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=545584.6666666666, ans=0.04949747468305833 2024-09-24 16:51:53,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=545584.6666666666, ans=0.125 2024-09-24 16:51:57,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=545631.3333333334, ans=0.0 2024-09-24 16:52:01,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=545631.3333333334, ans=0.035 2024-09-24 16:52:12,150 INFO [train.py:1198] (3/4) Epoch 31, batch 50, loss[loss=0.2225, ctc_loss=0.1482, cr_loss=0.3719, over 17066.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1274, cr_loss=0.3408, over 770054.28 frames. ], batch size: 46, lr: 3.88e-03, grad_scale: 16.0 2024-09-24 16:52:38,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=545724.6666666666, ans=0.125 2024-09-24 16:52:44,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=545771.3333333334, ans=0.0 2024-09-24 16:52:55,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=545771.3333333334, ans=0.0 2024-09-24 16:53:01,963 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.275e+02 1.374e+02 1.534e+02 1.966e+02, threshold=2.749e+02, percent-clipped=0.0 2024-09-24 16:53:26,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=545864.6666666666, ans=0.0 2024-09-24 16:53:27,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=545864.6666666666, ans=0.1 2024-09-24 16:53:34,046 INFO [train.py:1198] (3/4) Epoch 31, batch 100, loss[loss=0.2157, ctc_loss=0.1423, cr_loss=0.3672, over 16034.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1284, cr_loss=0.3418, over 1347672.33 frames. ], batch size: 74, lr: 3.87e-03, grad_scale: 16.0 2024-09-24 16:53:34,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=545911.3333333334, ans=0.125 2024-09-24 16:54:01,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=545958.0, ans=0.0 2024-09-24 16:54:11,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=546004.6666666666, ans=0.2 2024-09-24 16:54:23,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=546051.3333333334, ans=0.125 2024-09-24 16:54:31,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=546051.3333333334, ans=0.04949747468305833 2024-09-24 16:54:55,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=546144.6666666666, ans=0.125 2024-09-24 16:54:56,708 INFO [train.py:1198] (3/4) Epoch 31, batch 150, loss[loss=0.2209, ctc_loss=0.1455, cr_loss=0.3769, over 15839.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.1302, cr_loss=0.3459, over 1791024.01 frames. ], batch size: 74, lr: 3.87e-03, grad_scale: 16.0 2024-09-24 16:55:10,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.63 vs. limit=10.0 2024-09-24 16:55:11,026 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.49 vs. limit=15.0 2024-09-24 16:55:29,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2024-09-24 16:55:33,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=546238.0, ans=0.125 2024-09-24 16:55:44,680 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.256e+02 1.336e+02 1.452e+02 1.976e+02, threshold=2.673e+02, percent-clipped=0.0 2024-09-24 16:56:16,761 INFO [train.py:1198] (3/4) Epoch 31, batch 200, loss[loss=0.2083, ctc_loss=0.1456, cr_loss=0.3135, over 11633.00 frames. ], tot_loss[loss=0.1993, ctc_loss=0.13, cr_loss=0.3463, over 2134774.36 frames. ], batch size: 123, lr: 3.87e-03, grad_scale: 16.0 2024-09-24 16:56:47,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=546424.6666666666, ans=0.125 2024-09-24 16:56:56,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=546471.3333333334, ans=0.0 2024-09-24 16:57:05,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=546471.3333333334, ans=0.125 2024-09-24 16:57:22,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=546518.0, ans=0.125 2024-09-24 16:57:44,193 INFO [train.py:1198] (3/4) Epoch 31, batch 250, loss[loss=0.2071, ctc_loss=0.1331, cr_loss=0.3701, over 16847.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1289, cr_loss=0.3439, over 2410612.21 frames. ], batch size: 61, lr: 3.87e-03, grad_scale: 16.0 2024-09-24 16:57:50,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=546611.3333333334, ans=0.025 2024-09-24 16:58:28,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=546704.6666666666, ans=0.0 2024-09-24 16:58:31,700 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.278e+02 1.364e+02 1.509e+02 2.036e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-24 16:58:49,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=546798.0, ans=0.0 2024-09-24 16:59:02,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=546844.6666666666, ans=0.0 2024-09-24 16:59:03,344 INFO [train.py:1198] (3/4) Epoch 31, batch 300, loss[loss=0.2156, ctc_loss=0.1398, cr_loss=0.379, over 16696.00 frames. ], tot_loss[loss=0.1982, ctc_loss=0.1291, cr_loss=0.3454, over 2624260.01 frames. ], batch size: 66, lr: 3.87e-03, grad_scale: 16.0 2024-09-24 16:59:33,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=546891.3333333334, ans=0.125 2024-09-24 16:59:43,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=546938.0, ans=0.04949747468305833 2024-09-24 17:00:01,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2024-09-24 17:00:09,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2024-09-24 17:00:26,014 INFO [train.py:1198] (3/4) Epoch 31, batch 350, loss[loss=0.1969, ctc_loss=0.1285, cr_loss=0.3422, over 17212.00 frames. ], tot_loss[loss=0.1993, ctc_loss=0.1299, cr_loss=0.3469, over 2785121.47 frames. ], batch size: 55, lr: 3.87e-03, grad_scale: 16.0 2024-09-24 17:00:29,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=547078.0, ans=0.125 2024-09-24 17:00:46,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-09-24 17:01:08,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=547171.3333333334, ans=0.2 2024-09-24 17:01:14,097 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.224e+02 1.318e+02 1.423e+02 1.795e+02, threshold=2.635e+02, percent-clipped=0.0 2024-09-24 17:01:25,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2024-09-24 17:01:38,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.60 vs. limit=10.0 2024-09-24 17:01:51,999 INFO [train.py:1198] (3/4) Epoch 31, batch 400, loss[loss=0.199, ctc_loss=0.1298, cr_loss=0.3461, over 17073.00 frames. ], tot_loss[loss=0.1987, ctc_loss=0.1295, cr_loss=0.3462, over 2920579.25 frames. ], batch size: 46, lr: 3.87e-03, grad_scale: 32.0 2024-09-24 17:02:00,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=547311.3333333334, ans=0.2 2024-09-24 17:02:04,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=547311.3333333334, ans=0.025 2024-09-24 17:02:09,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=547358.0, ans=0.0 2024-09-24 17:02:16,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=547358.0, ans=0.0 2024-09-24 17:03:14,669 INFO [train.py:1198] (3/4) Epoch 31, batch 450, loss[loss=0.194, ctc_loss=0.1256, cr_loss=0.3419, over 16247.00 frames. ], tot_loss[loss=0.1982, ctc_loss=0.1292, cr_loss=0.3449, over 3017948.14 frames. ], batch size: 36, lr: 3.87e-03, grad_scale: 32.0 2024-09-24 17:03:21,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=547544.6666666666, ans=0.125 2024-09-24 17:03:35,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.15 vs. limit=15.0 2024-09-24 17:03:41,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=547591.3333333334, ans=0.125 2024-09-24 17:04:02,913 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.314e+02 1.420e+02 1.522e+02 1.945e+02, threshold=2.840e+02, percent-clipped=0.0 2024-09-24 17:04:22,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=547731.3333333334, ans=0.0 2024-09-24 17:04:26,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=547731.3333333334, ans=0.125 2024-09-24 17:04:31,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=547731.3333333334, ans=0.125 2024-09-24 17:04:37,645 INFO [train.py:1198] (3/4) Epoch 31, batch 500, loss[loss=0.1828, ctc_loss=0.119, cr_loss=0.3192, over 17078.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1298, cr_loss=0.3461, over 3095403.24 frames. ], batch size: 46, lr: 3.87e-03, grad_scale: 32.0 2024-09-24 17:04:42,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=547778.0, ans=0.125 2024-09-24 17:04:47,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-09-24 17:05:15,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=547871.3333333334, ans=0.125 2024-09-24 17:05:23,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=547871.3333333334, ans=0.025 2024-09-24 17:05:26,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=547918.0, ans=0.125 2024-09-24 17:05:33,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=547918.0, ans=0.125 2024-09-24 17:05:34,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=547918.0, ans=0.1 2024-09-24 17:05:46,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=547964.6666666666, ans=0.125 2024-09-24 17:05:57,827 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.87 vs. limit=10.0 2024-09-24 17:05:58,746 INFO [train.py:1198] (3/4) Epoch 31, batch 550, loss[loss=0.1648, ctc_loss=0.1071, cr_loss=0.2887, over 17137.00 frames. ], tot_loss[loss=0.1986, ctc_loss=0.1294, cr_loss=0.3462, over 3164234.54 frames. ], batch size: 40, lr: 3.87e-03, grad_scale: 32.0 2024-09-24 17:06:07,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=548011.3333333334, ans=0.125 2024-09-24 17:06:21,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=548058.0, ans=0.035 2024-09-24 17:06:49,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=548104.6666666666, ans=0.125 2024-09-24 17:06:52,449 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.269e+02 1.346e+02 1.471e+02 2.079e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-24 17:06:54,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=548151.3333333334, ans=0.0 2024-09-24 17:06:56,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=548151.3333333334, ans=0.0 2024-09-24 17:07:10,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=548198.0, ans=0.2 2024-09-24 17:07:21,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=548198.0, ans=0.125 2024-09-24 17:07:23,921 INFO [train.py:1198] (3/4) Epoch 31, batch 600, loss[loss=0.1689, ctc_loss=0.1081, cr_loss=0.304, over 17195.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1303, cr_loss=0.3476, over 3204106.51 frames. ], batch size: 41, lr: 3.87e-03, grad_scale: 32.0 2024-09-24 17:07:43,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=548291.3333333334, ans=0.1 2024-09-24 17:08:08,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=548338.0, ans=0.125 2024-09-24 17:08:17,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=548384.6666666666, ans=0.125 2024-09-24 17:08:27,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2024-09-24 17:08:41,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=548431.3333333334, ans=0.0 2024-09-24 17:08:44,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=548478.0, ans=0.125 2024-09-24 17:08:46,120 INFO [train.py:1198] (3/4) Epoch 31, batch 650, loss[loss=0.1521, ctc_loss=0.09781, cr_loss=0.2716, over 17078.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.13, cr_loss=0.3469, over 3231610.43 frames. ], batch size: 43, lr: 3.87e-03, grad_scale: 32.0 2024-09-24 17:08:51,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=548478.0, ans=0.1 2024-09-24 17:09:29,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.92 vs. limit=15.0 2024-09-24 17:09:35,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=548618.0, ans=0.125 2024-09-24 17:09:36,766 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.261e+02 1.352e+02 1.465e+02 2.749e+02, threshold=2.705e+02, percent-clipped=1.0 2024-09-24 17:09:57,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=548664.6666666666, ans=0.0 2024-09-24 17:09:59,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=548664.6666666666, ans=0.125 2024-09-24 17:10:08,670 INFO [train.py:1198] (3/4) Epoch 31, batch 700, loss[loss=0.225, ctc_loss=0.1451, cr_loss=0.3992, over 17012.00 frames. ], tot_loss[loss=0.1999, ctc_loss=0.1304, cr_loss=0.3475, over 3258417.21 frames. ], batch size: 51, lr: 3.86e-03, grad_scale: 32.0 2024-09-24 17:10:27,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2024-09-24 17:11:31,809 INFO [train.py:1198] (3/4) Epoch 31, batch 750, loss[loss=0.2172, ctc_loss=0.1412, cr_loss=0.3802, over 16885.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1297, cr_loss=0.3466, over 3280758.91 frames. ], batch size: 58, lr: 3.86e-03, grad_scale: 32.0 2024-09-24 17:11:33,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=548944.6666666666, ans=0.125 2024-09-24 17:12:08,635 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 17:12:13,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=549038.0, ans=0.125 2024-09-24 17:12:25,358 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.270e+02 1.339e+02 1.427e+02 2.075e+02, threshold=2.677e+02, percent-clipped=0.0 2024-09-24 17:12:40,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=549131.3333333334, ans=0.0 2024-09-24 17:12:48,465 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.50 vs. limit=15.0 2024-09-24 17:12:57,348 INFO [train.py:1198] (3/4) Epoch 31, batch 800, loss[loss=0.233, ctc_loss=0.1543, cr_loss=0.3931, over 15065.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.129, cr_loss=0.345, over 3298605.01 frames. ], batch size: 89, lr: 3.86e-03, grad_scale: 32.0 2024-09-24 17:13:26,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.56 vs. limit=12.0 2024-09-24 17:13:56,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=549318.0, ans=0.125 2024-09-24 17:13:58,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=549318.0, ans=0.125 2024-09-24 17:14:08,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=549364.6666666666, ans=0.09899494936611666 2024-09-24 17:14:17,351 INFO [train.py:1198] (3/4) Epoch 31, batch 850, loss[loss=0.1994, ctc_loss=0.1302, cr_loss=0.346, over 17091.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.1284, cr_loss=0.3442, over 3316297.31 frames. ], batch size: 49, lr: 3.86e-03, grad_scale: 32.0 2024-09-24 17:14:55,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.46 vs. limit=10.0 2024-09-24 17:15:01,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=549504.6666666666, ans=0.125 2024-09-24 17:15:05,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=549504.6666666666, ans=0.0 2024-09-24 17:15:09,432 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.257e+02 1.348e+02 1.447e+02 2.367e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-24 17:15:21,251 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.64 vs. limit=15.0 2024-09-24 17:15:23,833 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 17:15:26,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=549598.0, ans=0.2 2024-09-24 17:15:31,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=549598.0, ans=0.0 2024-09-24 17:15:36,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=549598.0, ans=0.0 2024-09-24 17:15:39,451 INFO [train.py:1198] (3/4) Epoch 31, batch 900, loss[loss=0.2003, ctc_loss=0.1311, cr_loss=0.3464, over 17201.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.1292, cr_loss=0.3455, over 3317453.02 frames. ], batch size: 50, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:15:42,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=549644.6666666666, ans=0.125 2024-09-24 17:15:56,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=549691.3333333334, ans=0.0 2024-09-24 17:17:05,205 INFO [train.py:1198] (3/4) Epoch 31, batch 950, loss[loss=0.1989, ctc_loss=0.1315, cr_loss=0.3372, over 17094.00 frames. ], tot_loss[loss=0.1975, ctc_loss=0.1287, cr_loss=0.3444, over 3326540.91 frames. ], batch size: 49, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:17:16,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=549878.0, ans=0.125 2024-09-24 17:17:57,028 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.281e+02 1.359e+02 1.441e+02 1.895e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-24 17:18:10,894 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2024-09-24 17:18:27,498 INFO [train.py:1198] (3/4) Epoch 31, batch 1000, loss[loss=0.2695, ctc_loss=0.1855, cr_loss=0.4201, over 12040.00 frames. ], tot_loss[loss=0.1975, ctc_loss=0.1287, cr_loss=0.344, over 3327818.24 frames. ], batch size: 123, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:18:46,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-09-24 17:19:01,970 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-09-24 17:19:50,157 INFO [train.py:1198] (3/4) Epoch 31, batch 1050, loss[loss=0.1901, ctc_loss=0.1232, cr_loss=0.3345, over 17158.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.1293, cr_loss=0.3451, over 3335251.50 frames. ], batch size: 45, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:19:53,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.82 vs. limit=15.0 2024-09-24 17:19:58,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=550344.6666666666, ans=0.0 2024-09-24 17:20:00,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=550344.6666666666, ans=0.125 2024-09-24 17:20:39,886 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.258e+02 1.325e+02 1.421e+02 1.846e+02, threshold=2.651e+02, percent-clipped=0.0 2024-09-24 17:20:44,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=550484.6666666666, ans=0.125 2024-09-24 17:20:54,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=550531.3333333334, ans=0.1 2024-09-24 17:20:55,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=550531.3333333334, ans=0.0 2024-09-24 17:21:08,016 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=22.5 2024-09-24 17:21:10,281 INFO [train.py:1198] (3/4) Epoch 31, batch 1100, loss[loss=0.1783, ctc_loss=0.115, cr_loss=0.3162, over 17275.00 frames. ], tot_loss[loss=0.1979, ctc_loss=0.1289, cr_loss=0.3446, over 3352790.09 frames. ], batch size: 42, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:21:13,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=550578.0, ans=0.125 2024-09-24 17:21:26,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=550578.0, ans=0.2 2024-09-24 17:21:37,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=550624.6666666666, ans=0.0 2024-09-24 17:21:47,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=550671.3333333334, ans=0.0 2024-09-24 17:21:51,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=550671.3333333334, ans=0.0 2024-09-24 17:21:53,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=550671.3333333334, ans=0.1 2024-09-24 17:22:29,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=550764.6666666666, ans=0.0 2024-09-24 17:22:34,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=550764.6666666666, ans=15.0 2024-09-24 17:22:39,187 INFO [train.py:1198] (3/4) Epoch 31, batch 1150, loss[loss=0.2304, ctc_loss=0.1502, cr_loss=0.4013, over 17013.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1297, cr_loss=0.3458, over 3355201.16 frames. ], batch size: 51, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:23:03,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=550858.0, ans=0.0 2024-09-24 17:23:04,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=550858.0, ans=0.0 2024-09-24 17:23:09,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=550904.6666666666, ans=0.1 2024-09-24 17:23:13,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.82 vs. limit=22.5 2024-09-24 17:23:28,709 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.270e+02 1.354e+02 1.493e+02 2.055e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-24 17:23:30,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=550951.3333333334, ans=0.125 2024-09-24 17:23:39,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.44 vs. limit=15.0 2024-09-24 17:23:51,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=550998.0, ans=0.0 2024-09-24 17:23:59,056 INFO [train.py:1198] (3/4) Epoch 31, batch 1200, loss[loss=0.1718, ctc_loss=0.1088, cr_loss=0.3146, over 16229.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1297, cr_loss=0.3466, over 3358625.16 frames. ], batch size: 36, lr: 3.86e-03, grad_scale: 32.0 2024-09-24 17:24:20,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=551091.3333333334, ans=0.1 2024-09-24 17:24:40,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=551138.0, ans=0.95 2024-09-24 17:24:47,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=551138.0, ans=0.125 2024-09-24 17:24:59,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=551184.6666666666, ans=0.025 2024-09-24 17:25:21,965 INFO [train.py:1198] (3/4) Epoch 31, batch 1250, loss[loss=0.2097, ctc_loss=0.1448, cr_loss=0.3245, over 11931.00 frames. ], tot_loss[loss=0.1992, ctc_loss=0.1297, cr_loss=0.3472, over 3344301.29 frames. ], batch size: 124, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:26:08,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=551418.0, ans=0.125 2024-09-24 17:26:13,266 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.296e+02 1.398e+02 1.491e+02 2.236e+02, threshold=2.796e+02, percent-clipped=0.0 2024-09-24 17:26:17,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2024-09-24 17:26:32,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-09-24 17:26:39,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=551464.6666666666, ans=0.0 2024-09-24 17:26:42,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=551464.6666666666, ans=0.0 2024-09-24 17:26:46,963 INFO [train.py:1198] (3/4) Epoch 31, batch 1300, loss[loss=0.2577, ctc_loss=0.1744, cr_loss=0.4165, over 14905.00 frames. ], tot_loss[loss=0.2001, ctc_loss=0.1304, cr_loss=0.3485, over 3348766.58 frames. ], batch size: 89, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:26:50,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=551511.3333333334, ans=0.2 2024-09-24 17:26:51,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=551511.3333333334, ans=0.0 2024-09-24 17:26:55,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=551511.3333333334, ans=0.0 2024-09-24 17:26:58,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=551511.3333333334, ans=0.125 2024-09-24 17:27:43,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=551651.3333333334, ans=0.125 2024-09-24 17:28:09,626 INFO [train.py:1198] (3/4) Epoch 31, batch 1350, loss[loss=0.235, ctc_loss=0.156, cr_loss=0.3952, over 15038.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.1305, cr_loss=0.3489, over 3353357.58 frames. ], batch size: 89, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:28:32,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=551791.3333333334, ans=0.0 2024-09-24 17:28:38,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2024-09-24 17:28:43,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=551838.0, ans=0.125 2024-09-24 17:28:45,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=551838.0, ans=0.125 2024-09-24 17:28:53,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=551838.0, ans=0.125 2024-09-24 17:28:58,033 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 17:29:00,716 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.265e+02 1.360e+02 1.447e+02 1.964e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-24 17:29:32,593 INFO [train.py:1198] (3/4) Epoch 31, batch 1400, loss[loss=0.1857, ctc_loss=0.1206, cr_loss=0.3253, over 17222.00 frames. ], tot_loss[loss=0.2005, ctc_loss=0.1308, cr_loss=0.3485, over 3341301.79 frames. ], batch size: 50, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:30:48,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=552164.6666666666, ans=0.125 2024-09-24 17:30:52,826 INFO [train.py:1198] (3/4) Epoch 31, batch 1450, loss[loss=0.1494, ctc_loss=0.09562, cr_loss=0.2688, over 17109.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1307, cr_loss=0.3482, over 3333835.69 frames. ], batch size: 43, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:30:58,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=552211.3333333334, ans=0.125 2024-09-24 17:31:01,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=552211.3333333334, ans=0.125 2024-09-24 17:31:07,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=552258.0, ans=0.125 2024-09-24 17:31:09,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552258.0, ans=0.1 2024-09-24 17:31:18,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=552258.0, ans=0.125 2024-09-24 17:31:39,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=552304.6666666666, ans=0.0 2024-09-24 17:31:48,823 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.229e+02 1.325e+02 1.469e+02 3.189e+02, threshold=2.649e+02, percent-clipped=1.0 2024-09-24 17:32:03,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=552398.0, ans=0.0 2024-09-24 17:32:19,916 INFO [train.py:1198] (3/4) Epoch 31, batch 1500, loss[loss=0.2176, ctc_loss=0.1449, cr_loss=0.3632, over 14968.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1307, cr_loss=0.3483, over 3343493.55 frames. ], batch size: 89, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:32:23,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=552444.6666666666, ans=15.0 2024-09-24 17:32:24,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=552444.6666666666, ans=0.2 2024-09-24 17:32:43,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=552491.3333333334, ans=0.2 2024-09-24 17:32:47,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2024-09-24 17:33:09,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552584.6666666666, ans=0.1 2024-09-24 17:33:39,659 INFO [train.py:1198] (3/4) Epoch 31, batch 1550, loss[loss=0.202, ctc_loss=0.1279, cr_loss=0.3705, over 17022.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.1309, cr_loss=0.3483, over 3343160.33 frames. ], batch size: 53, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:33:44,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=552678.0, ans=0.2 2024-09-24 17:33:48,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=552678.0, ans=0.0 2024-09-24 17:33:53,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=552678.0, ans=0.125 2024-09-24 17:34:01,586 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2024-09-24 17:34:33,295 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.274e+02 1.344e+02 1.453e+02 2.165e+02, threshold=2.688e+02, percent-clipped=0.0 2024-09-24 17:34:39,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=8.74 vs. limit=22.5 2024-09-24 17:34:56,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=552864.6666666666, ans=0.2 2024-09-24 17:34:57,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=552864.6666666666, ans=0.125 2024-09-24 17:35:02,240 INFO [train.py:1198] (3/4) Epoch 31, batch 1600, loss[loss=0.193, ctc_loss=0.1265, cr_loss=0.3323, over 17248.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1309, cr_loss=0.3478, over 3349533.58 frames. ], batch size: 44, lr: 3.85e-03, grad_scale: 32.0 2024-09-24 17:35:28,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2024-09-24 17:35:39,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2024-09-24 17:36:25,121 INFO [train.py:1198] (3/4) Epoch 31, batch 1650, loss[loss=0.1991, ctc_loss=0.1285, cr_loss=0.3527, over 17231.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.1309, cr_loss=0.3484, over 3356735.58 frames. ], batch size: 50, lr: 3.85e-03, grad_scale: 32.0 2024-09-24 17:36:31,812 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 17:36:31,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=553144.6666666666, ans=0.125 2024-09-24 17:36:33,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=553144.6666666666, ans=0.125 2024-09-24 17:36:38,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.89 vs. limit=15.0 2024-09-24 17:36:47,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=15.0 2024-09-24 17:37:18,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=553284.6666666666, ans=0.1 2024-09-24 17:37:21,454 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.255e+02 1.333e+02 1.430e+02 2.111e+02, threshold=2.665e+02, percent-clipped=0.0 2024-09-24 17:37:33,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=553331.3333333334, ans=0.95 2024-09-24 17:37:50,372 INFO [train.py:1198] (3/4) Epoch 31, batch 1700, loss[loss=0.1779, ctc_loss=0.1139, cr_loss=0.3197, over 17224.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1296, cr_loss=0.3468, over 3366155.29 frames. ], batch size: 47, lr: 3.85e-03, grad_scale: 32.0 2024-09-24 17:38:09,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2024-09-24 17:38:23,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=553471.3333333334, ans=0.0 2024-09-24 17:38:26,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=553471.3333333334, ans=0.125 2024-09-24 17:38:49,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2024-09-24 17:39:06,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=553564.6666666666, ans=0.125 2024-09-24 17:39:13,901 INFO [train.py:1198] (3/4) Epoch 31, batch 1750, loss[loss=0.238, ctc_loss=0.1575, cr_loss=0.4022, over 15163.00 frames. ], tot_loss[loss=0.1991, ctc_loss=0.1297, cr_loss=0.3472, over 3366597.36 frames. ], batch size: 89, lr: 3.85e-03, grad_scale: 32.0 2024-09-24 17:39:18,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=553611.3333333334, ans=0.125 2024-09-24 17:39:41,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=553658.0, ans=0.125 2024-09-24 17:40:02,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=553751.3333333334, ans=0.0 2024-09-24 17:40:04,937 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.305e+02 1.399e+02 1.556e+02 3.053e+02, threshold=2.798e+02, percent-clipped=1.0 2024-09-24 17:40:19,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=553798.0, ans=0.125 2024-09-24 17:40:20,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2024-09-24 17:40:33,712 INFO [train.py:1198] (3/4) Epoch 31, batch 1800, loss[loss=0.2011, ctc_loss=0.1294, cr_loss=0.3585, over 17289.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1306, cr_loss=0.3488, over 3338376.90 frames. ], batch size: 49, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:40:34,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.52 vs. limit=10.0 2024-09-24 17:40:38,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=553844.6666666666, ans=0.125 2024-09-24 17:40:56,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=553891.3333333334, ans=0.125 2024-09-24 17:41:09,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=553938.0, ans=0.0 2024-09-24 17:41:11,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=553938.0, ans=0.1 2024-09-24 17:41:49,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.79 vs. limit=6.0 2024-09-24 17:41:50,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=554031.3333333334, ans=0.2 2024-09-24 17:41:50,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=554031.3333333334, ans=0.125 2024-09-24 17:42:01,016 INFO [train.py:1198] (3/4) Epoch 31, batch 1850, loss[loss=0.231, ctc_loss=0.1505, cr_loss=0.4027, over 16991.00 frames. ], tot_loss[loss=0.2, ctc_loss=0.1303, cr_loss=0.3484, over 3342117.33 frames. ], batch size: 53, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:42:20,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=554124.6666666666, ans=0.1 2024-09-24 17:42:39,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=554171.3333333334, ans=15.0 2024-09-24 17:42:53,413 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.284e+02 1.348e+02 1.466e+02 2.241e+02, threshold=2.697e+02, percent-clipped=0.0 2024-09-24 17:43:03,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=554264.6666666666, ans=0.0 2024-09-24 17:43:20,565 INFO [train.py:1198] (3/4) Epoch 31, batch 1900, loss[loss=0.2042, ctc_loss=0.1321, cr_loss=0.3608, over 17045.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1302, cr_loss=0.3476, over 3340134.57 frames. ], batch size: 52, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:43:24,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=554311.3333333334, ans=0.125 2024-09-24 17:43:49,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=554358.0, ans=0.125 2024-09-24 17:44:36,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=554498.0, ans=0.125 2024-09-24 17:44:42,953 INFO [train.py:1198] (3/4) Epoch 31, batch 1950, loss[loss=0.174, ctc_loss=0.1111, cr_loss=0.3149, over 17089.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.13, cr_loss=0.347, over 3344553.55 frames. ], batch size: 43, lr: 3.84e-03, grad_scale: 16.0 2024-09-24 17:45:01,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=554591.3333333334, ans=0.125 2024-09-24 17:45:09,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=554591.3333333334, ans=0.125 2024-09-24 17:45:31,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=554684.6666666666, ans=0.1 2024-09-24 17:45:35,728 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.290e+02 1.379e+02 1.473e+02 2.110e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-24 17:45:35,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=554684.6666666666, ans=0.125 2024-09-24 17:46:05,600 INFO [train.py:1198] (3/4) Epoch 31, batch 2000, loss[loss=0.1702, ctc_loss=0.1064, cr_loss=0.3191, over 17105.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1302, cr_loss=0.3476, over 3345006.69 frames. ], batch size: 40, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:46:27,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=554824.6666666666, ans=0.125 2024-09-24 17:46:40,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=554871.3333333334, ans=0.125 2024-09-24 17:46:47,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=554871.3333333334, ans=0.0 2024-09-24 17:47:08,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554918.0, ans=0.1 2024-09-24 17:47:10,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=554918.0, ans=0.125 2024-09-24 17:47:25,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=554964.6666666666, ans=10.0 2024-09-24 17:47:31,372 INFO [train.py:1198] (3/4) Epoch 31, batch 2050, loss[loss=0.1817, ctc_loss=0.1183, cr_loss=0.3169, over 17067.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1297, cr_loss=0.3464, over 3346288.16 frames. ], batch size: 46, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:47:57,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=555058.0, ans=0.2 2024-09-24 17:48:06,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=555104.6666666666, ans=0.035 2024-09-24 17:48:16,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=555104.6666666666, ans=0.125 2024-09-24 17:48:24,148 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.242e+02 1.319e+02 1.408e+02 1.614e+02, threshold=2.639e+02, percent-clipped=0.0 2024-09-24 17:48:50,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=555244.6666666666, ans=0.125 2024-09-24 17:48:51,350 INFO [train.py:1198] (3/4) Epoch 31, batch 2100, loss[loss=0.1926, ctc_loss=0.122, cr_loss=0.3529, over 17159.00 frames. ], tot_loss[loss=0.1981, ctc_loss=0.1291, cr_loss=0.3452, over 3342500.88 frames. ], batch size: 45, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:49:02,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=555244.6666666666, ans=0.025 2024-09-24 17:49:02,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=555244.6666666666, ans=0.125 2024-09-24 17:49:05,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=555244.6666666666, ans=0.125 2024-09-24 17:49:08,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=555291.3333333334, ans=0.09899494936611666 2024-09-24 17:49:13,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=555291.3333333334, ans=0.125 2024-09-24 17:49:14,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=555291.3333333334, ans=0.125 2024-09-24 17:49:30,533 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.62 vs. limit=10.0 2024-09-24 17:50:14,775 INFO [train.py:1198] (3/4) Epoch 31, batch 2150, loss[loss=0.2002, ctc_loss=0.1314, cr_loss=0.3442, over 16996.00 frames. ], tot_loss[loss=0.1987, ctc_loss=0.1294, cr_loss=0.3464, over 3357683.67 frames. ], batch size: 53, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:50:36,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.06 vs. limit=22.5 2024-09-24 17:50:46,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=555524.6666666666, ans=0.125 2024-09-24 17:50:59,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=555571.3333333334, ans=0.125 2024-09-24 17:51:10,130 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.047e+02 1.261e+02 1.344e+02 1.456e+02 2.277e+02, threshold=2.688e+02, percent-clipped=0.0 2024-09-24 17:51:39,716 INFO [train.py:1198] (3/4) Epoch 31, batch 2200, loss[loss=0.1788, ctc_loss=0.1157, cr_loss=0.3154, over 17221.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1287, cr_loss=0.3449, over 3357184.06 frames. ], batch size: 55, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:52:35,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=555851.3333333334, ans=0.1 2024-09-24 17:52:43,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=555851.3333333334, ans=0.125 2024-09-24 17:53:02,099 INFO [train.py:1198] (3/4) Epoch 31, batch 2250, loss[loss=0.2064, ctc_loss=0.1341, cr_loss=0.3615, over 16037.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1286, cr_loss=0.3449, over 3359032.85 frames. ], batch size: 74, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:53:15,937 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.54 vs. limit=15.0 2024-09-24 17:53:47,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=556038.0, ans=0.0 2024-09-24 17:53:57,558 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.242e+02 1.333e+02 1.428e+02 2.130e+02, threshold=2.666e+02, percent-clipped=0.0 2024-09-24 17:54:23,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=556178.0, ans=0.125 2024-09-24 17:54:24,674 INFO [train.py:1198] (3/4) Epoch 31, batch 2300, loss[loss=0.1784, ctc_loss=0.1145, cr_loss=0.3197, over 17018.00 frames. ], tot_loss[loss=0.1981, ctc_loss=0.1289, cr_loss=0.3461, over 3366384.52 frames. ], batch size: 44, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:54:48,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=556224.6666666666, ans=0.2 2024-09-24 17:55:04,962 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 17:55:04,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=556271.3333333334, ans=0.0 2024-09-24 17:55:22,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=556318.0, ans=0.2 2024-09-24 17:55:44,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=556364.6666666666, ans=0.125 2024-09-24 17:55:47,360 INFO [train.py:1198] (3/4) Epoch 31, batch 2350, loss[loss=0.1669, ctc_loss=0.1062, cr_loss=0.3033, over 17277.00 frames. ], tot_loss[loss=0.1982, ctc_loss=0.129, cr_loss=0.3458, over 3356707.71 frames. ], batch size: 42, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:56:43,030 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.265e+02 1.349e+02 1.461e+02 1.759e+02, threshold=2.698e+02, percent-clipped=0.0 2024-09-24 17:57:11,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=556644.6666666666, ans=0.1 2024-09-24 17:57:12,807 INFO [train.py:1198] (3/4) Epoch 31, batch 2400, loss[loss=0.2153, ctc_loss=0.143, cr_loss=0.3618, over 17311.00 frames. ], tot_loss[loss=0.1988, ctc_loss=0.1296, cr_loss=0.3463, over 3349236.19 frames. ], batch size: 49, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:57:35,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=556691.3333333334, ans=0.0 2024-09-24 17:58:04,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=556784.6666666666, ans=0.0 2024-09-24 17:58:32,916 INFO [train.py:1198] (3/4) Epoch 31, batch 2450, loss[loss=0.1915, ctc_loss=0.1262, cr_loss=0.3265, over 17141.00 frames. ], tot_loss[loss=0.1986, ctc_loss=0.1294, cr_loss=0.3462, over 3353901.14 frames. ], batch size: 48, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:58:35,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-09-24 17:59:28,402 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.235e+02 1.328e+02 1.443e+02 2.383e+02, threshold=2.655e+02, percent-clipped=0.0 2024-09-24 17:59:44,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=557064.6666666666, ans=0.125 2024-09-24 17:59:55,324 INFO [train.py:1198] (3/4) Epoch 31, batch 2500, loss[loss=0.1709, ctc_loss=0.1104, cr_loss=0.3023, over 17072.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.1299, cr_loss=0.3474, over 3354008.66 frames. ], batch size: 43, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:59:57,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=557111.3333333334, ans=0.125 2024-09-24 17:59:58,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=557111.3333333334, ans=0.125 2024-09-24 18:00:09,382 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.66 vs. limit=15.0 2024-09-24 18:00:12,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=22.5 2024-09-24 18:00:43,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=557204.6666666666, ans=0.2 2024-09-24 18:01:05,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=557298.0, ans=0.0 2024-09-24 18:01:18,464 INFO [train.py:1198] (3/4) Epoch 31, batch 2550, loss[loss=0.2409, ctc_loss=0.1606, cr_loss=0.4014, over 15057.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1307, cr_loss=0.3485, over 3346039.71 frames. ], batch size: 89, lr: 3.83e-03, grad_scale: 32.0 2024-09-24 18:01:34,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=557344.6666666666, ans=0.2 2024-09-24 18:01:56,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=557438.0, ans=0.125 2024-09-24 18:01:56,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=557438.0, ans=0.125 2024-09-24 18:02:16,588 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.233e+02 1.314e+02 1.407e+02 1.652e+02, threshold=2.628e+02, percent-clipped=0.0 2024-09-24 18:02:23,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=557484.6666666666, ans=0.125 2024-09-24 18:02:42,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=557578.0, ans=0.2 2024-09-24 18:02:43,701 INFO [train.py:1198] (3/4) Epoch 31, batch 2600, loss[loss=0.2207, ctc_loss=0.1488, cr_loss=0.3594, over 12402.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.13, cr_loss=0.347, over 3347737.90 frames. ], batch size: 124, lr: 3.83e-03, grad_scale: 32.0 2024-09-24 18:03:08,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=557624.6666666666, ans=0.125 2024-09-24 18:03:27,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=557671.3333333334, ans=0.0 2024-09-24 18:03:41,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=557718.0, ans=0.05 2024-09-24 18:03:45,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=557718.0, ans=0.2 2024-09-24 18:03:54,313 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:03:59,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=557764.6666666666, ans=10.0 2024-09-24 18:04:02,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=557764.6666666666, ans=0.125 2024-09-24 18:04:06,717 INFO [train.py:1198] (3/4) Epoch 31, batch 2650, loss[loss=0.1581, ctc_loss=0.1004, cr_loss=0.2883, over 17267.00 frames. ], tot_loss[loss=0.1995, ctc_loss=0.1301, cr_loss=0.3472, over 3354555.15 frames. ], batch size: 42, lr: 3.83e-03, grad_scale: 32.0 2024-09-24 18:04:08,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=557811.3333333334, ans=15.0 2024-09-24 18:04:29,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=557858.0, ans=0.2 2024-09-24 18:04:59,304 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.276e+02 1.391e+02 1.491e+02 3.639e+02, threshold=2.781e+02, percent-clipped=1.0 2024-09-24 18:05:00,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.67 vs. limit=10.0 2024-09-24 18:05:01,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=557951.3333333334, ans=0.0 2024-09-24 18:05:04,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=557951.3333333334, ans=0.1 2024-09-24 18:05:09,670 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.16 vs. limit=15.0 2024-09-24 18:05:10,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=557998.0, ans=0.125 2024-09-24 18:05:26,484 INFO [train.py:1198] (3/4) Epoch 31, batch 2700, loss[loss=0.2594, ctc_loss=0.1781, cr_loss=0.4066, over 14982.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.13, cr_loss=0.3471, over 3344418.57 frames. ], batch size: 89, lr: 3.83e-03, grad_scale: 32.0 2024-09-24 18:05:31,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=558044.6666666666, ans=0.125 2024-09-24 18:05:48,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=558091.3333333334, ans=0.125 2024-09-24 18:05:50,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=558091.3333333334, ans=0.0 2024-09-24 18:06:05,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=22.5 2024-09-24 18:06:14,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=558138.0, ans=0.125 2024-09-24 18:06:36,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=558231.3333333334, ans=0.04949747468305833 2024-09-24 18:06:40,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=558231.3333333334, ans=0.1 2024-09-24 18:06:51,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=558231.3333333334, ans=0.0 2024-09-24 18:06:54,370 INFO [train.py:1198] (3/4) Epoch 31, batch 2750, loss[loss=0.1878, ctc_loss=0.1199, cr_loss=0.3394, over 17151.00 frames. ], tot_loss[loss=0.1987, ctc_loss=0.1295, cr_loss=0.3462, over 3349145.67 frames. ], batch size: 48, lr: 3.83e-03, grad_scale: 16.0 2024-09-24 18:06:59,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558278.0, ans=0.1 2024-09-24 18:07:17,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=558324.6666666666, ans=0.2 2024-09-24 18:07:18,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=558324.6666666666, ans=0.0 2024-09-24 18:07:32,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=558371.3333333334, ans=0.1 2024-09-24 18:07:48,591 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.267e+02 1.363e+02 1.488e+02 1.840e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-24 18:08:03,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=558464.6666666666, ans=0.125 2024-09-24 18:08:06,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=558464.6666666666, ans=0.0 2024-09-24 18:08:06,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=558464.6666666666, ans=0.125 2024-09-24 18:08:14,313 INFO [train.py:1198] (3/4) Epoch 31, batch 2800, loss[loss=0.1893, ctc_loss=0.1229, cr_loss=0.3321, over 17301.00 frames. ], tot_loss[loss=0.1979, ctc_loss=0.1289, cr_loss=0.3447, over 3335104.84 frames. ], batch size: 46, lr: 3.83e-03, grad_scale: 32.0 2024-09-24 18:08:17,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=558511.3333333334, ans=0.125 2024-09-24 18:08:38,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=558558.0, ans=0.125 2024-09-24 18:09:25,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=558698.0, ans=0.0 2024-09-24 18:09:27,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=558698.0, ans=0.0 2024-09-24 18:09:36,691 INFO [train.py:1198] (3/4) Epoch 31, batch 2850, loss[loss=0.1952, ctc_loss=0.128, cr_loss=0.336, over 17000.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1298, cr_loss=0.3462, over 3325638.24 frames. ], batch size: 44, lr: 3.83e-03, grad_scale: 32.0 2024-09-24 18:09:46,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=558744.6666666666, ans=0.2 2024-09-24 18:10:10,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=558838.0, ans=0.125 2024-09-24 18:10:14,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2024-09-24 18:10:20,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=558838.0, ans=0.0 2024-09-24 18:10:35,375 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.299e+02 1.361e+02 1.459e+02 2.038e+02, threshold=2.722e+02, percent-clipped=0.0 2024-09-24 18:10:55,114 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:10:59,640 INFO [train.py:1198] (3/4) Epoch 31, batch 2900, loss[loss=0.1561, ctc_loss=0.09822, cr_loss=0.2892, over 17000.00 frames. ], tot_loss[loss=0.1996, ctc_loss=0.1301, cr_loss=0.3472, over 3340291.26 frames. ], batch size: 39, lr: 3.83e-03, grad_scale: 16.0 2024-09-24 18:11:52,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=559118.0, ans=0.025 2024-09-24 18:11:54,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.77 vs. limit=12.0 2024-09-24 18:12:08,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=559164.6666666666, ans=0.125 2024-09-24 18:12:14,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=559164.6666666666, ans=0.2 2024-09-24 18:12:25,454 INFO [train.py:1198] (3/4) Epoch 31, batch 2950, loss[loss=0.176, ctc_loss=0.1133, cr_loss=0.3136, over 17087.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.1308, cr_loss=0.3487, over 3337882.86 frames. ], batch size: 43, lr: 3.83e-03, grad_scale: 16.0 2024-09-24 18:12:34,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559211.3333333334, ans=0.1 2024-09-24 18:12:45,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559258.0, ans=0.1 2024-09-24 18:13:08,123 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-09-24 18:13:12,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=559351.3333333334, ans=0.1 2024-09-24 18:13:19,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2024-09-24 18:13:21,743 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.279e+02 1.362e+02 1.485e+02 2.605e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-24 18:13:33,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=559398.0, ans=0.125 2024-09-24 18:13:39,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=559398.0, ans=0.1 2024-09-24 18:13:45,389 INFO [train.py:1198] (3/4) Epoch 31, batch 3000, loss[loss=0.2014, ctc_loss=0.1294, cr_loss=0.3601, over 17030.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.131, cr_loss=0.3491, over 3347569.78 frames. ], batch size: 44, lr: 3.83e-03, grad_scale: 16.0 2024-09-24 18:13:45,390 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 18:14:00,844 INFO [train.py:1230] (3/4) Epoch 31, validation: loss=0.03667, ctc_loss=0.03667, cr_loss=9.013e-15, over 944034.00 frames. 2024-09-24 18:14:00,845 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 18:14:07,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=559444.6666666666, ans=0.025 2024-09-24 18:14:10,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=559444.6666666666, ans=0.05 2024-09-24 18:14:18,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.45 vs. limit=10.0 2024-09-24 18:14:48,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=559584.6666666666, ans=0.125 2024-09-24 18:15:07,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=12.0 2024-09-24 18:15:19,083 INFO [train.py:1198] (3/4) Epoch 31, batch 3050, loss[loss=0.2183, ctc_loss=0.1414, cr_loss=0.3848, over 17011.00 frames. ], tot_loss[loss=0.1992, ctc_loss=0.1298, cr_loss=0.3471, over 3354289.56 frames. ], batch size: 53, lr: 3.83e-03, grad_scale: 16.0 2024-09-24 18:15:44,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=559724.6666666666, ans=0.125 2024-09-24 18:16:09,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=559818.0, ans=0.1 2024-09-24 18:16:13,651 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.275e+02 1.357e+02 1.505e+02 2.128e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-24 18:16:37,065 INFO [train.py:1198] (3/4) Epoch 31, batch 3100, loss[loss=0.2171, ctc_loss=0.1427, cr_loss=0.3719, over 16881.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1303, cr_loss=0.3478, over 3360661.46 frames. ], batch size: 58, lr: 3.83e-03, grad_scale: 16.0 2024-09-24 18:16:40,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=559911.3333333334, ans=0.0 2024-09-24 18:16:41,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=559911.3333333334, ans=0.0 2024-09-24 18:16:46,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=559911.3333333334, ans=0.125 2024-09-24 18:16:55,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=559958.0, ans=0.0 2024-09-24 18:17:23,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=560004.6666666666, ans=0.125 2024-09-24 18:17:29,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560051.3333333334, ans=0.1 2024-09-24 18:17:39,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=560051.3333333334, ans=0.0 2024-09-24 18:17:40,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=560098.0, ans=0.125 2024-09-24 18:17:43,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.73 vs. limit=10.0 2024-09-24 18:17:54,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=560098.0, ans=0.125 2024-09-24 18:17:55,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=560098.0, ans=0.025 2024-09-24 18:18:00,012 INFO [train.py:1198] (3/4) Epoch 31, batch 3150, loss[loss=0.2164, ctc_loss=0.1433, cr_loss=0.3656, over 14720.00 frames. ], tot_loss[loss=0.1999, ctc_loss=0.1304, cr_loss=0.3477, over 3364710.20 frames. ], batch size: 88, lr: 3.83e-03, grad_scale: 16.0 2024-09-24 18:18:22,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=560191.3333333334, ans=0.2 2024-09-24 18:18:36,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=560238.0, ans=0.125 2024-09-24 18:18:39,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=560238.0, ans=0.125 2024-09-24 18:18:41,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=560238.0, ans=0.025 2024-09-24 18:18:49,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=560284.6666666666, ans=0.0 2024-09-24 18:18:55,090 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.279e+02 1.355e+02 1.423e+02 2.365e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-24 18:19:06,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=560331.3333333334, ans=0.125 2024-09-24 18:19:06,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=560331.3333333334, ans=0.1 2024-09-24 18:19:11,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=560331.3333333334, ans=0.04949747468305833 2024-09-24 18:19:18,882 INFO [train.py:1198] (3/4) Epoch 31, batch 3200, loss[loss=0.1719, ctc_loss=0.1105, cr_loss=0.3072, over 17185.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1296, cr_loss=0.3465, over 3369940.11 frames. ], batch size: 41, lr: 3.82e-03, grad_scale: 32.0 2024-09-24 18:19:27,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=560378.0, ans=0.125 2024-09-24 18:19:35,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2024-09-24 18:19:37,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=560424.6666666666, ans=0.125 2024-09-24 18:19:53,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=560471.3333333334, ans=0.125 2024-09-24 18:20:06,350 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:20:19,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=560518.0, ans=0.0 2024-09-24 18:20:26,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2024-09-24 18:20:30,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=560564.6666666666, ans=0.07 2024-09-24 18:20:41,361 INFO [train.py:1198] (3/4) Epoch 31, batch 3250, loss[loss=0.189, ctc_loss=0.1215, cr_loss=0.3374, over 17296.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.129, cr_loss=0.3454, over 3370586.86 frames. ], batch size: 49, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:20:43,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=22.5 2024-09-24 18:20:46,313 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:21:05,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=560658.0, ans=0.0 2024-09-24 18:21:17,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=560704.6666666666, ans=0.125 2024-09-24 18:21:37,469 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.237e+02 1.355e+02 1.495e+02 1.938e+02, threshold=2.711e+02, percent-clipped=0.0 2024-09-24 18:21:56,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=560798.0, ans=0.125 2024-09-24 18:21:59,318 INFO [train.py:1198] (3/4) Epoch 31, batch 3300, loss[loss=0.1911, ctc_loss=0.1216, cr_loss=0.3477, over 17034.00 frames. ], tot_loss[loss=0.1971, ctc_loss=0.1282, cr_loss=0.3443, over 3374549.81 frames. ], batch size: 52, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:22:04,328 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:22:45,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=560984.6666666666, ans=0.025 2024-09-24 18:23:10,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=561031.3333333334, ans=0.125 2024-09-24 18:23:18,167 INFO [train.py:1198] (3/4) Epoch 31, batch 3350, loss[loss=0.165, ctc_loss=0.1044, cr_loss=0.3031, over 17259.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.1284, cr_loss=0.3441, over 3365310.59 frames. ], batch size: 42, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:23:21,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=561078.0, ans=0.0 2024-09-24 18:23:23,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=561078.0, ans=10.0 2024-09-24 18:23:41,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=561124.6666666666, ans=0.025 2024-09-24 18:24:14,289 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.283e+02 1.367e+02 1.477e+02 4.748e+02, threshold=2.734e+02, percent-clipped=1.0 2024-09-24 18:24:24,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=561264.6666666666, ans=0.125 2024-09-24 18:24:28,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=561264.6666666666, ans=0.125 2024-09-24 18:24:36,294 INFO [train.py:1198] (3/4) Epoch 31, batch 3400, loss[loss=0.1685, ctc_loss=0.1059, cr_loss=0.313, over 16317.00 frames. ], tot_loss[loss=0.1979, ctc_loss=0.1288, cr_loss=0.3457, over 3360530.27 frames. ], batch size: 36, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:24:44,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=561311.3333333334, ans=0.125 2024-09-24 18:24:52,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=561358.0, ans=0.0 2024-09-24 18:24:55,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=561358.0, ans=0.1 2024-09-24 18:25:09,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=561404.6666666666, ans=0.125 2024-09-24 18:25:28,835 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:25:47,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=561498.0, ans=0.0 2024-09-24 18:25:48,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=561498.0, ans=0.0 2024-09-24 18:25:56,290 INFO [train.py:1198] (3/4) Epoch 31, batch 3450, loss[loss=0.2066, ctc_loss=0.1366, cr_loss=0.3502, over 17293.00 frames. ], tot_loss[loss=0.1986, ctc_loss=0.1293, cr_loss=0.3464, over 3364040.49 frames. ], batch size: 49, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:26:16,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=561591.3333333334, ans=0.125 2024-09-24 18:26:24,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=561591.3333333334, ans=0.0 2024-09-24 18:26:37,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=561638.0, ans=0.1 2024-09-24 18:26:44,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=561684.6666666666, ans=10.0 2024-09-24 18:26:45,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=22.5 2024-09-24 18:26:47,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2024-09-24 18:26:52,252 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.286e+02 1.363e+02 1.449e+02 2.372e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-24 18:26:56,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=561684.6666666666, ans=15.0 2024-09-24 18:27:07,464 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-09-24 18:27:08,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=561731.3333333334, ans=0.1 2024-09-24 18:27:14,200 INFO [train.py:1198] (3/4) Epoch 31, batch 3500, loss[loss=0.167, ctc_loss=0.1085, cr_loss=0.2926, over 17266.00 frames. ], tot_loss[loss=0.1975, ctc_loss=0.1285, cr_loss=0.3447, over 3372847.56 frames. ], batch size: 42, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:27:14,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=561778.0, ans=0.125 2024-09-24 18:27:35,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=561824.6666666666, ans=0.2 2024-09-24 18:28:34,276 INFO [train.py:1198] (3/4) Epoch 31, batch 3550, loss[loss=0.2094, ctc_loss=0.1353, cr_loss=0.3702, over 17021.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.1291, cr_loss=0.3459, over 3365582.52 frames. ], batch size: 44, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:29:22,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.57 vs. limit=10.0 2024-09-24 18:29:30,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=12.0 2024-09-24 18:29:34,935 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.302e+02 1.378e+02 1.478e+02 2.528e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-24 18:29:44,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=562198.0, ans=0.5 2024-09-24 18:29:50,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=562198.0, ans=0.125 2024-09-24 18:29:56,792 INFO [train.py:1198] (3/4) Epoch 31, batch 3600, loss[loss=0.1755, ctc_loss=0.1138, cr_loss=0.3083, over 17073.00 frames. ], tot_loss[loss=0.1982, ctc_loss=0.129, cr_loss=0.3457, over 3369582.56 frames. ], batch size: 46, lr: 3.82e-03, grad_scale: 32.0 2024-09-24 18:30:03,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=562244.6666666666, ans=0.025 2024-09-24 18:30:12,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=562291.3333333334, ans=0.1 2024-09-24 18:30:35,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=562338.0, ans=0.0 2024-09-24 18:30:46,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=562384.6666666666, ans=0.125 2024-09-24 18:30:51,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.21 vs. limit=12.0 2024-09-24 18:31:03,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=562431.3333333334, ans=0.1 2024-09-24 18:31:14,557 INFO [train.py:1198] (3/4) Epoch 31, batch 3650, loss[loss=0.1383, ctc_loss=0.08455, cr_loss=0.2686, over 17102.00 frames. ], tot_loss[loss=0.1991, ctc_loss=0.1298, cr_loss=0.3466, over 3351813.19 frames. ], batch size: 40, lr: 3.82e-03, grad_scale: 32.0 2024-09-24 18:32:10,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=562618.0, ans=0.125 2024-09-24 18:32:11,872 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.260e+02 1.374e+02 1.513e+02 2.191e+02, threshold=2.748e+02, percent-clipped=0.0 2024-09-24 18:32:15,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=562618.0, ans=0.04949747468305833 2024-09-24 18:32:26,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=562664.6666666666, ans=0.0 2024-09-24 18:32:33,923 INFO [train.py:1198] (3/4) Epoch 31, batch 3700, loss[loss=0.1664, ctc_loss=0.1048, cr_loss=0.308, over 17047.00 frames. ], tot_loss[loss=0.1996, ctc_loss=0.1302, cr_loss=0.3472, over 3346960.22 frames. ], batch size: 39, lr: 3.82e-03, grad_scale: 32.0 2024-09-24 18:32:37,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=562711.3333333334, ans=0.125 2024-09-24 18:33:15,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=562804.6666666666, ans=0.125 2024-09-24 18:33:35,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=562898.0, ans=0.5 2024-09-24 18:33:36,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=562898.0, ans=0.125 2024-09-24 18:33:52,140 INFO [train.py:1198] (3/4) Epoch 31, batch 3750, loss[loss=0.205, ctc_loss=0.136, cr_loss=0.3452, over 17024.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1297, cr_loss=0.3458, over 3326938.16 frames. ], batch size: 44, lr: 3.82e-03, grad_scale: 32.0 2024-09-24 18:34:06,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2024-09-24 18:34:20,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=562991.3333333334, ans=0.2 2024-09-24 18:34:47,691 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=12.0 2024-09-24 18:34:48,370 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.261e+02 1.326e+02 1.431e+02 2.182e+02, threshold=2.652e+02, percent-clipped=0.0 2024-09-24 18:34:51,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=563084.6666666666, ans=0.125 2024-09-24 18:34:59,550 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:34:59,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=563131.3333333334, ans=0.0 2024-09-24 18:35:05,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=563131.3333333334, ans=0.125 2024-09-24 18:35:09,881 INFO [train.py:1198] (3/4) Epoch 31, batch 3800, loss[loss=0.1529, ctc_loss=0.09796, cr_loss=0.2747, over 16783.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1298, cr_loss=0.3457, over 3315518.65 frames. ], batch size: 37, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:35:24,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.72 vs. limit=15.0 2024-09-24 18:35:37,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=563224.6666666666, ans=0.1 2024-09-24 18:35:39,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=563271.3333333334, ans=0.09899494936611666 2024-09-24 18:36:14,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=563364.6666666666, ans=0.09899494936611666 2024-09-24 18:36:27,172 INFO [train.py:1198] (3/4) Epoch 31, batch 3850, loss[loss=0.2006, ctc_loss=0.1288, cr_loss=0.3592, over 17229.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1305, cr_loss=0.3466, over 3284162.27 frames. ], batch size: 47, lr: 3.81e-03, grad_scale: 16.0 2024-09-24 18:36:35,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=563411.3333333334, ans=0.0 2024-09-24 18:36:44,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=563458.0, ans=0.1 2024-09-24 18:36:52,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.46 vs. limit=15.0 2024-09-24 18:36:54,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=563458.0, ans=0.2 2024-09-24 18:37:11,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=15.0 2024-09-24 18:37:20,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=563551.3333333334, ans=0.125 2024-09-24 18:37:23,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=563551.3333333334, ans=0.2 2024-09-24 18:37:24,225 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.313e+02 1.427e+02 1.596e+02 2.274e+02, threshold=2.854e+02, percent-clipped=0.0 2024-09-24 18:37:26,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=563551.3333333334, ans=0.1 2024-09-24 18:38:29,753 INFO [train.py:1198] (3/4) Epoch 32, batch 0, loss[loss=0.1874, ctc_loss=0.1197, cr_loss=0.3385, over 16973.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1197, cr_loss=0.3385, over 16973.00 frames. ], batch size: 42, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:38:29,754 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 18:38:45,165 INFO [train.py:1230] (3/4) Epoch 32, validation: loss=0.03599, ctc_loss=0.03599, cr_loss=9.022e-15, over 944034.00 frames. 2024-09-24 18:38:45,166 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 18:38:57,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=563626.0, ans=0.125 2024-09-24 18:39:17,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=563719.3333333334, ans=0.0 2024-09-24 18:39:17,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=563719.3333333334, ans=0.125 2024-09-24 18:39:18,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=563719.3333333334, ans=0.0 2024-09-24 18:39:20,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=563719.3333333334, ans=0.025 2024-09-24 18:39:23,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=563719.3333333334, ans=0.2 2024-09-24 18:39:31,366 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-09-24 18:39:33,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=563766.0, ans=0.015 2024-09-24 18:39:42,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=15.0 2024-09-24 18:39:47,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=563766.0, ans=0.0 2024-09-24 18:40:14,198 INFO [train.py:1198] (3/4) Epoch 32, batch 50, loss[loss=0.2137, ctc_loss=0.1433, cr_loss=0.3518, over 17044.00 frames. ], tot_loss[loss=0.2027, ctc_loss=0.1326, cr_loss=0.3502, over 751931.14 frames. ], batch size: 56, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:40:36,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=563906.0, ans=0.125 2024-09-24 18:40:42,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2024-09-24 18:41:19,567 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.253e+02 1.338e+02 1.477e+02 2.326e+02, threshold=2.677e+02, percent-clipped=0.0 2024-09-24 18:41:33,987 INFO [train.py:1198] (3/4) Epoch 32, batch 100, loss[loss=0.2009, ctc_loss=0.1302, cr_loss=0.3538, over 17054.00 frames. ], tot_loss[loss=0.1978, ctc_loss=0.129, cr_loss=0.3442, over 1336340.24 frames. ], batch size: 52, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:41:35,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=564092.6666666666, ans=0.125 2024-09-24 18:41:40,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=564092.6666666666, ans=0.125 2024-09-24 18:41:47,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.28 vs. limit=15.0 2024-09-24 18:41:48,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=564139.3333333334, ans=0.125 2024-09-24 18:42:04,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=564186.0, ans=0.125 2024-09-24 18:42:12,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=564186.0, ans=0.125 2024-09-24 18:42:18,288 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=22.5 2024-09-24 18:42:32,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2024-09-24 18:42:34,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=564232.6666666666, ans=0.0 2024-09-24 18:42:49,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=564279.3333333334, ans=0.1 2024-09-24 18:42:56,207 INFO [train.py:1198] (3/4) Epoch 32, batch 150, loss[loss=0.1543, ctc_loss=0.09644, cr_loss=0.2893, over 17089.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1262, cr_loss=0.3395, over 1790768.73 frames. ], batch size: 43, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:43:14,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=564372.6666666666, ans=0.0 2024-09-24 18:43:40,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=15.0 2024-09-24 18:43:47,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=564466.0, ans=0.1 2024-09-24 18:44:01,705 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.312e+02 1.410e+02 1.549e+02 1.890e+02, threshold=2.819e+02, percent-clipped=0.0 2024-09-24 18:44:10,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=564512.6666666666, ans=0.125 2024-09-24 18:44:11,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=564512.6666666666, ans=0.2 2024-09-24 18:44:11,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=564512.6666666666, ans=0.05 2024-09-24 18:44:11,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=564512.6666666666, ans=0.2 2024-09-24 18:44:16,129 INFO [train.py:1198] (3/4) Epoch 32, batch 200, loss[loss=0.2258, ctc_loss=0.1511, cr_loss=0.3736, over 16213.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1274, cr_loss=0.342, over 2138528.30 frames. ], batch size: 74, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:44:31,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=564606.0, ans=0.0 2024-09-24 18:45:32,164 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:45:40,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=564746.0, ans=0.05 2024-09-24 18:45:46,134 INFO [train.py:1198] (3/4) Epoch 32, batch 250, loss[loss=0.1948, ctc_loss=0.123, cr_loss=0.3588, over 17267.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.128, cr_loss=0.3441, over 2415928.44 frames. ], batch size: 44, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:45:48,009 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:46:27,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=564886.0, ans=0.0 2024-09-24 18:46:34,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=564932.6666666666, ans=0.2 2024-09-24 18:46:43,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=564932.6666666666, ans=0.0 2024-09-24 18:46:52,819 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.269e+02 1.327e+02 1.431e+02 2.007e+02, threshold=2.654e+02, percent-clipped=0.0 2024-09-24 18:47:04,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2024-09-24 18:47:05,423 INFO [train.py:1198] (3/4) Epoch 32, batch 300, loss[loss=0.1595, ctc_loss=0.1022, cr_loss=0.2865, over 17248.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1267, cr_loss=0.3414, over 2628577.22 frames. ], batch size: 44, lr: 3.75e-03, grad_scale: 16.0 2024-09-24 18:47:28,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=565072.6666666666, ans=0.125 2024-09-24 18:48:05,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=12.0 2024-09-24 18:48:13,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=565212.6666666666, ans=0.2 2024-09-24 18:48:29,165 INFO [train.py:1198] (3/4) Epoch 32, batch 350, loss[loss=0.1556, ctc_loss=0.09921, cr_loss=0.2819, over 16269.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1267, cr_loss=0.342, over 2787135.99 frames. ], batch size: 36, lr: 3.75e-03, grad_scale: 16.0 2024-09-24 18:48:29,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2024-09-24 18:48:38,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.17 vs. limit=15.0 2024-09-24 18:48:44,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=565306.0, ans=0.025 2024-09-24 18:49:01,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=565352.6666666666, ans=0.1 2024-09-24 18:49:19,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=565399.3333333334, ans=0.025 2024-09-24 18:49:22,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=565399.3333333334, ans=0.1 2024-09-24 18:49:36,803 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.263e+02 1.319e+02 1.416e+02 2.178e+02, threshold=2.638e+02, percent-clipped=0.0 2024-09-24 18:49:52,385 INFO [train.py:1198] (3/4) Epoch 32, batch 400, loss[loss=0.162, ctc_loss=0.1032, cr_loss=0.2943, over 17201.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1269, cr_loss=0.3427, over 2914986.21 frames. ], batch size: 41, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:50:27,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=565539.3333333334, ans=0.1 2024-09-24 18:50:27,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=565539.3333333334, ans=0.125 2024-09-24 18:51:18,649 INFO [train.py:1198] (3/4) Epoch 32, batch 450, loss[loss=0.1816, ctc_loss=0.1143, cr_loss=0.3363, over 17255.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.1281, cr_loss=0.3448, over 2994699.26 frames. ], batch size: 44, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:51:52,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=565819.3333333334, ans=0.1 2024-09-24 18:52:00,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=565819.3333333334, ans=0.1 2024-09-24 18:52:05,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=565866.0, ans=0.125 2024-09-24 18:52:11,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=565866.0, ans=0.125 2024-09-24 18:52:25,964 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.290e+02 1.389e+02 1.522e+02 2.571e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-24 18:52:31,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=565912.6666666666, ans=0.125 2024-09-24 18:52:31,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=565912.6666666666, ans=0.1 2024-09-24 18:52:35,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=565912.6666666666, ans=0.1 2024-09-24 18:52:38,790 INFO [train.py:1198] (3/4) Epoch 32, batch 500, loss[loss=0.1774, ctc_loss=0.1144, cr_loss=0.3152, over 17092.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1278, cr_loss=0.3439, over 3078175.46 frames. ], batch size: 43, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 18:53:00,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=566006.0, ans=0.0 2024-09-24 18:53:17,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=566052.6666666666, ans=0.2 2024-09-24 18:53:25,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=566052.6666666666, ans=0.125 2024-09-24 18:53:33,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=566099.3333333334, ans=0.125 2024-09-24 18:54:01,198 INFO [train.py:1198] (3/4) Epoch 32, batch 550, loss[loss=0.1818, ctc_loss=0.1182, cr_loss=0.3182, over 17259.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1277, cr_loss=0.3445, over 3146603.76 frames. ], batch size: 42, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 18:54:18,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2024-09-24 18:54:32,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=566286.0, ans=0.1 2024-09-24 18:54:39,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=566286.0, ans=0.125 2024-09-24 18:54:45,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=566286.0, ans=0.125 2024-09-24 18:55:16,236 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.263e+02 1.353e+02 1.425e+02 2.013e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-24 18:55:19,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=566379.3333333334, ans=0.2 2024-09-24 18:55:23,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=566379.3333333334, ans=0.0 2024-09-24 18:55:29,024 INFO [train.py:1198] (3/4) Epoch 32, batch 600, loss[loss=0.1787, ctc_loss=0.1129, cr_loss=0.3287, over 17085.00 frames. ], tot_loss[loss=0.1971, ctc_loss=0.128, cr_loss=0.3453, over 3198139.92 frames. ], batch size: 40, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 18:55:34,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=566426.0, ans=0.1 2024-09-24 18:55:50,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=566472.6666666666, ans=0.1 2024-09-24 18:56:01,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=566519.3333333334, ans=0.125 2024-09-24 18:56:49,245 INFO [train.py:1198] (3/4) Epoch 32, batch 650, loss[loss=0.2146, ctc_loss=0.1392, cr_loss=0.3769, over 17227.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1285, cr_loss=0.3457, over 3226795.01 frames. ], batch size: 50, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 18:56:52,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=566659.3333333334, ans=0.2 2024-09-24 18:56:54,478 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:57:02,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=566659.3333333334, ans=0.125 2024-09-24 18:57:12,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=566706.0, ans=0.125 2024-09-24 18:57:12,269 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:57:20,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=566752.6666666666, ans=0.1 2024-09-24 18:58:01,566 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.265e+02 1.321e+02 1.437e+02 2.053e+02, threshold=2.643e+02, percent-clipped=0.0 2024-09-24 18:58:05,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=566846.0, ans=0.125 2024-09-24 18:58:10,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=566846.0, ans=0.125 2024-09-24 18:58:13,018 INFO [train.py:1198] (3/4) Epoch 32, batch 700, loss[loss=0.166, ctc_loss=0.1063, cr_loss=0.2984, over 17190.00 frames. ], tot_loss[loss=0.1984, ctc_loss=0.1292, cr_loss=0.3462, over 3257995.01 frames. ], batch size: 41, lr: 3.74e-03, grad_scale: 16.0 2024-09-24 18:58:19,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=566892.6666666666, ans=0.125 2024-09-24 18:58:32,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=566939.3333333334, ans=0.125 2024-09-24 18:58:40,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=566939.3333333334, ans=0.1 2024-09-24 18:58:56,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=566986.0, ans=10.0 2024-09-24 18:59:23,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=567079.3333333334, ans=0.125 2024-09-24 18:59:33,160 INFO [train.py:1198] (3/4) Epoch 32, batch 750, loss[loss=0.2028, ctc_loss=0.1313, cr_loss=0.3576, over 16984.00 frames. ], tot_loss[loss=0.1992, ctc_loss=0.1297, cr_loss=0.3472, over 3272933.21 frames. ], batch size: 53, lr: 3.74e-03, grad_scale: 16.0 2024-09-24 18:59:37,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=567126.0, ans=0.0 2024-09-24 18:59:40,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=567126.0, ans=0.0 2024-09-24 18:59:56,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=567172.6666666666, ans=0.125 2024-09-24 19:00:05,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=567172.6666666666, ans=0.0 2024-09-24 19:00:17,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=567219.3333333334, ans=0.0 2024-09-24 19:00:22,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=567219.3333333334, ans=0.125 2024-09-24 19:00:29,135 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 19:00:43,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=567312.6666666666, ans=0.2 2024-09-24 19:00:49,277 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.286e+02 1.352e+02 1.497e+02 2.163e+02, threshold=2.705e+02, percent-clipped=0.0 2024-09-24 19:01:00,450 INFO [train.py:1198] (3/4) Epoch 32, batch 800, loss[loss=0.2333, ctc_loss=0.153, cr_loss=0.4011, over 17060.00 frames. ], tot_loss[loss=0.1993, ctc_loss=0.1299, cr_loss=0.3473, over 3295056.11 frames. ], batch size: 46, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 19:01:00,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=567359.3333333334, ans=0.125 2024-09-24 19:01:58,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=567499.3333333334, ans=0.09899494936611666 2024-09-24 19:01:59,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2024-09-24 19:02:01,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=567499.3333333334, ans=0.125 2024-09-24 19:02:20,179 INFO [train.py:1198] (3/4) Epoch 32, batch 850, loss[loss=0.193, ctc_loss=0.1251, cr_loss=0.3396, over 17038.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1301, cr_loss=0.348, over 3313016.73 frames. ], batch size: 56, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 19:02:21,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=567592.6666666666, ans=0.125 2024-09-24 19:02:36,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=567639.3333333334, ans=0.2 2024-09-24 19:02:52,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=567639.3333333334, ans=0.125 2024-09-24 19:03:32,003 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.300e+02 1.382e+02 1.535e+02 3.730e+02, threshold=2.764e+02, percent-clipped=1.0 2024-09-24 19:03:34,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-24 19:03:36,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=15.0 2024-09-24 19:03:43,427 INFO [train.py:1198] (3/4) Epoch 32, batch 900, loss[loss=0.1552, ctc_loss=0.09879, cr_loss=0.2818, over 17027.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1302, cr_loss=0.3478, over 3328064.85 frames. ], batch size: 39, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 19:03:46,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=567826.0, ans=0.125 2024-09-24 19:03:55,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=567826.0, ans=0.0 2024-09-24 19:03:59,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567872.6666666666, ans=0.1 2024-09-24 19:04:42,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=567966.0, ans=0.125 2024-09-24 19:05:01,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2024-09-24 19:05:04,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=568012.6666666666, ans=0.125 2024-09-24 19:05:09,167 INFO [train.py:1198] (3/4) Epoch 32, batch 950, loss[loss=0.1656, ctc_loss=0.1058, cr_loss=0.2989, over 17105.00 frames. ], tot_loss[loss=0.1986, ctc_loss=0.1294, cr_loss=0.3459, over 3336506.88 frames. ], batch size: 43, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 19:05:26,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=12.0 2024-09-24 19:05:42,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=568152.6666666666, ans=0.2 2024-09-24 19:06:03,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=568199.3333333334, ans=0.0 2024-09-24 19:06:20,504 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.287e+02 1.382e+02 1.480e+02 1.903e+02, threshold=2.765e+02, percent-clipped=0.0 2024-09-24 19:06:20,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=568246.0, ans=0.025 2024-09-24 19:06:31,839 INFO [train.py:1198] (3/4) Epoch 32, batch 1000, loss[loss=0.1767, ctc_loss=0.1111, cr_loss=0.3283, over 17168.00 frames. ], tot_loss[loss=0.1978, ctc_loss=0.1287, cr_loss=0.3453, over 3342642.85 frames. ], batch size: 41, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 19:06:38,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=568292.6666666666, ans=0.1 2024-09-24 19:06:38,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=568292.6666666666, ans=0.125 2024-09-24 19:06:41,016 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.36 vs. limit=22.5 2024-09-24 19:07:54,847 INFO [train.py:1198] (3/4) Epoch 32, batch 1050, loss[loss=0.2177, ctc_loss=0.1434, cr_loss=0.3716, over 17007.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.1289, cr_loss=0.3456, over 3345406.13 frames. ], batch size: 51, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 19:07:57,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=22.5 2024-09-24 19:07:58,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=568526.0, ans=0.125 2024-09-24 19:08:41,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=568666.0, ans=0.125 2024-09-24 19:08:51,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568666.0, ans=0.1 2024-09-24 19:08:59,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=568712.6666666666, ans=0.025 2024-09-24 19:09:03,547 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.269e+02 1.394e+02 1.503e+02 2.012e+02, threshold=2.787e+02, percent-clipped=0.0 2024-09-24 19:09:11,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=568712.6666666666, ans=0.125 2024-09-24 19:09:15,036 INFO [train.py:1198] (3/4) Epoch 32, batch 1100, loss[loss=0.1813, ctc_loss=0.1139, cr_loss=0.3368, over 17119.00 frames. ], tot_loss[loss=0.1979, ctc_loss=0.1288, cr_loss=0.3457, over 3347275.11 frames. ], batch size: 40, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 19:09:21,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=568759.3333333334, ans=0.1 2024-09-24 19:10:04,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=568852.6666666666, ans=0.125 2024-09-24 19:10:22,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=568899.3333333334, ans=0.125 2024-09-24 19:10:41,725 INFO [train.py:1198] (3/4) Epoch 32, batch 1150, loss[loss=0.2164, ctc_loss=0.1383, cr_loss=0.3903, over 17040.00 frames. ], tot_loss[loss=0.1971, ctc_loss=0.1282, cr_loss=0.3444, over 3352976.83 frames. ], batch size: 52, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:10:43,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=568992.6666666666, ans=0.025 2024-09-24 19:10:50,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=568992.6666666666, ans=0.125 2024-09-24 19:11:01,894 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.93 vs. limit=6.0 2024-09-24 19:11:03,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2024-09-24 19:11:25,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=569086.0, ans=0.125 2024-09-24 19:11:33,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=569132.6666666666, ans=0.125 2024-09-24 19:11:39,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=569132.6666666666, ans=0.125 2024-09-24 19:11:42,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569132.6666666666, ans=0.1 2024-09-24 19:11:50,711 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.271e+02 1.362e+02 1.461e+02 3.149e+02, threshold=2.725e+02, percent-clipped=1.0 2024-09-24 19:11:51,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=569179.3333333334, ans=0.025 2024-09-24 19:11:57,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=569179.3333333334, ans=0.0 2024-09-24 19:12:01,964 INFO [train.py:1198] (3/4) Epoch 32, batch 1200, loss[loss=0.2154, ctc_loss=0.1423, cr_loss=0.3657, over 14823.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1278, cr_loss=0.3432, over 3353297.33 frames. ], batch size: 89, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:12:42,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569319.3333333334, ans=0.1 2024-09-24 19:12:57,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=569366.0, ans=0.125 2024-09-24 19:13:24,568 INFO [train.py:1198] (3/4) Epoch 32, batch 1250, loss[loss=0.1815, ctc_loss=0.1188, cr_loss=0.3136, over 17046.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1279, cr_loss=0.3432, over 3355440.10 frames. ], batch size: 39, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:13:34,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=569459.3333333334, ans=0.125 2024-09-24 19:13:36,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2024-09-24 19:13:42,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=569506.0, ans=0.1 2024-09-24 19:13:55,853 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2024-09-24 19:14:34,074 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.275e+02 1.363e+02 1.497e+02 1.949e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-24 19:14:50,539 INFO [train.py:1198] (3/4) Epoch 32, batch 1300, loss[loss=0.195, ctc_loss=0.128, cr_loss=0.3352, over 17206.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.1277, cr_loss=0.3429, over 3363990.86 frames. ], batch size: 47, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:14:57,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=569692.6666666666, ans=0.125 2024-09-24 19:14:58,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=569692.6666666666, ans=0.07 2024-09-24 19:15:03,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=569692.6666666666, ans=0.125 2024-09-24 19:15:11,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=569739.3333333334, ans=0.95 2024-09-24 19:15:29,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=569786.0, ans=0.125 2024-09-24 19:16:13,219 INFO [train.py:1198] (3/4) Epoch 32, batch 1350, loss[loss=0.2028, ctc_loss=0.1308, cr_loss=0.3601, over 17144.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1277, cr_loss=0.3431, over 3360865.16 frames. ], batch size: 48, lr: 3.73e-03, grad_scale: 16.0 2024-09-24 19:16:32,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=569972.6666666666, ans=0.125 2024-09-24 19:16:32,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=569972.6666666666, ans=0.0 2024-09-24 19:16:39,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=569972.6666666666, ans=0.0 2024-09-24 19:16:50,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=570019.3333333334, ans=0.125 2024-09-24 19:16:50,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=570019.3333333334, ans=0.125 2024-09-24 19:17:23,674 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.251e+02 1.359e+02 1.461e+02 2.063e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-24 19:17:33,286 INFO [train.py:1198] (3/4) Epoch 32, batch 1400, loss[loss=0.1748, ctc_loss=0.1148, cr_loss=0.3001, over 16280.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1281, cr_loss=0.3434, over 3358024.62 frames. ], batch size: 36, lr: 3.73e-03, grad_scale: 16.0 2024-09-24 19:18:03,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=570206.0, ans=0.0 2024-09-24 19:18:11,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=22.5 2024-09-24 19:18:56,780 INFO [train.py:1198] (3/4) Epoch 32, batch 1450, loss[loss=0.1748, ctc_loss=0.1112, cr_loss=0.3181, over 16634.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1283, cr_loss=0.3433, over 3351897.70 frames. ], batch size: 37, lr: 3.73e-03, grad_scale: 16.0 2024-09-24 19:20:09,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=570579.3333333334, ans=0.125 2024-09-24 19:20:12,697 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.251e+02 1.341e+02 1.423e+02 1.681e+02, threshold=2.682e+02, percent-clipped=0.0 2024-09-24 19:20:24,805 INFO [train.py:1198] (3/4) Epoch 32, batch 1500, loss[loss=0.2031, ctc_loss=0.1329, cr_loss=0.351, over 17295.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1279, cr_loss=0.3435, over 3362740.42 frames. ], batch size: 46, lr: 3.73e-03, grad_scale: 16.0 2024-09-24 19:20:26,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=570626.0, ans=0.125 2024-09-24 19:20:34,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=570626.0, ans=0.0 2024-09-24 19:20:37,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=570626.0, ans=0.125 2024-09-24 19:20:37,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=570626.0, ans=0.0 2024-09-24 19:20:50,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=570672.6666666666, ans=0.05 2024-09-24 19:21:01,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=570719.3333333334, ans=0.1 2024-09-24 19:21:22,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=570766.0, ans=0.125 2024-09-24 19:21:33,574 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 19:21:44,250 INFO [train.py:1198] (3/4) Epoch 32, batch 1550, loss[loss=0.1932, ctc_loss=0.122, cr_loss=0.3559, over 17091.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.128, cr_loss=0.3434, over 3365707.49 frames. ], batch size: 40, lr: 3.73e-03, grad_scale: 16.0 2024-09-24 19:21:55,920 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 19:22:04,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=570906.0, ans=0.125 2024-09-24 19:22:57,882 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.269e+02 1.362e+02 1.460e+02 2.798e+02, threshold=2.724e+02, percent-clipped=1.0 2024-09-24 19:23:07,638 INFO [train.py:1198] (3/4) Epoch 32, batch 1600, loss[loss=0.184, ctc_loss=0.1182, cr_loss=0.3289, over 17007.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1279, cr_loss=0.3432, over 3369199.44 frames. ], batch size: 44, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:23:14,500 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 19:23:20,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=571092.6666666666, ans=0.0 2024-09-24 19:24:13,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=571279.3333333334, ans=0.2 2024-09-24 19:24:13,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=571279.3333333334, ans=0.1 2024-09-24 19:24:20,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=571279.3333333334, ans=0.0 2024-09-24 19:24:27,904 INFO [train.py:1198] (3/4) Epoch 32, batch 1650, loss[loss=0.2006, ctc_loss=0.1314, cr_loss=0.3462, over 17258.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.1276, cr_loss=0.3423, over 3361365.33 frames. ], batch size: 44, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:25:13,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=571419.3333333334, ans=15.0 2024-09-24 19:25:24,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=571466.0, ans=0.125 2024-09-24 19:25:25,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=571466.0, ans=0.125 2024-09-24 19:25:46,146 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.279e+02 1.364e+02 1.457e+02 2.649e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-24 19:25:51,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=571512.6666666666, ans=0.1 2024-09-24 19:25:55,640 INFO [train.py:1198] (3/4) Epoch 32, batch 1700, loss[loss=0.1664, ctc_loss=0.1046, cr_loss=0.309, over 17107.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1278, cr_loss=0.3432, over 3363573.21 frames. ], batch size: 43, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:26:02,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=571559.3333333334, ans=0.125 2024-09-24 19:26:31,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=571652.6666666666, ans=0.125 2024-09-24 19:26:58,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-09-24 19:27:15,588 INFO [train.py:1198] (3/4) Epoch 32, batch 1750, loss[loss=0.2279, ctc_loss=0.1537, cr_loss=0.3708, over 15945.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.1275, cr_loss=0.3431, over 3372789.01 frames. ], batch size: 74, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:27:19,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=571792.6666666666, ans=0.125 2024-09-24 19:27:50,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=571886.0, ans=0.125 2024-09-24 19:27:58,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=571886.0, ans=0.125 2024-09-24 19:28:20,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=571979.3333333334, ans=0.1 2024-09-24 19:28:27,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=22.5 2024-09-24 19:28:28,286 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.257e+02 1.347e+02 1.444e+02 1.830e+02, threshold=2.693e+02, percent-clipped=0.0 2024-09-24 19:28:30,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=571979.3333333334, ans=0.025 2024-09-24 19:28:37,949 INFO [train.py:1198] (3/4) Epoch 32, batch 1800, loss[loss=0.1943, ctc_loss=0.126, cr_loss=0.3417, over 17320.00 frames. ], tot_loss[loss=0.1973, ctc_loss=0.1284, cr_loss=0.3447, over 3372993.05 frames. ], batch size: 49, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:29:23,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=572119.3333333334, ans=0.0 2024-09-24 19:29:48,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=572212.6666666666, ans=0.125 2024-09-24 19:29:50,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=572212.6666666666, ans=0.025 2024-09-24 19:30:02,992 INFO [train.py:1198] (3/4) Epoch 32, batch 1850, loss[loss=0.1778, ctc_loss=0.1134, cr_loss=0.3222, over 17130.00 frames. ], tot_loss[loss=0.1974, ctc_loss=0.1283, cr_loss=0.3451, over 3377436.39 frames. ], batch size: 48, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:30:03,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=572259.3333333334, ans=0.05 2024-09-24 19:30:11,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=572259.3333333334, ans=0.0 2024-09-24 19:30:23,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=572306.0, ans=0.2 2024-09-24 19:30:33,366 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 19:30:47,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=572352.6666666666, ans=0.125 2024-09-24 19:30:57,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.23 vs. limit=15.0 2024-09-24 19:31:00,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=572399.3333333334, ans=0.125 2024-09-24 19:31:07,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=572446.0, ans=0.2 2024-09-24 19:31:08,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=572446.0, ans=0.0 2024-09-24 19:31:17,413 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.271e+02 1.345e+02 1.449e+02 2.207e+02, threshold=2.691e+02, percent-clipped=0.0 2024-09-24 19:31:17,992 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.94 vs. limit=10.0 2024-09-24 19:31:25,382 INFO [train.py:1198] (3/4) Epoch 32, batch 1900, loss[loss=0.1758, ctc_loss=0.1148, cr_loss=0.305, over 17075.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.128, cr_loss=0.3442, over 3367950.85 frames. ], batch size: 39, lr: 3.72e-03, grad_scale: 16.0 2024-09-24 19:31:49,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=572539.3333333334, ans=0.125 2024-09-24 19:32:05,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=572586.0, ans=0.0 2024-09-24 19:32:18,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=572632.6666666666, ans=0.125 2024-09-24 19:32:27,348 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.99 vs. limit=10.0 2024-09-24 19:32:28,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=572679.3333333334, ans=0.125 2024-09-24 19:32:32,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2024-09-24 19:32:39,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=572679.3333333334, ans=0.0 2024-09-24 19:32:47,863 INFO [train.py:1198] (3/4) Epoch 32, batch 1950, loss[loss=0.2359, ctc_loss=0.1531, cr_loss=0.4141, over 17192.00 frames. ], tot_loss[loss=0.1974, ctc_loss=0.1285, cr_loss=0.3449, over 3365999.66 frames. ], batch size: 55, lr: 3.72e-03, grad_scale: 16.0 2024-09-24 19:33:01,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=572726.0, ans=0.125 2024-09-24 19:33:13,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=572772.6666666666, ans=0.125 2024-09-24 19:33:29,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572819.3333333334, ans=0.1 2024-09-24 19:33:52,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=572912.6666666666, ans=0.1 2024-09-24 19:34:00,276 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.299e+02 1.379e+02 1.509e+02 1.987e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-24 19:34:08,194 INFO [train.py:1198] (3/4) Epoch 32, batch 2000, loss[loss=0.2051, ctc_loss=0.1373, cr_loss=0.3388, over 15943.00 frames. ], tot_loss[loss=0.1987, ctc_loss=0.1294, cr_loss=0.3467, over 3364838.02 frames. ], batch size: 74, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:34:14,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=572959.3333333334, ans=0.125 2024-09-24 19:34:46,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=573052.6666666666, ans=0.125 2024-09-24 19:35:10,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=573099.3333333334, ans=0.025 2024-09-24 19:35:25,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=573146.0, ans=0.0 2024-09-24 19:35:33,466 INFO [train.py:1198] (3/4) Epoch 32, batch 2050, loss[loss=0.184, ctc_loss=0.1182, cr_loss=0.3291, over 17075.00 frames. ], tot_loss[loss=0.1992, ctc_loss=0.1299, cr_loss=0.3468, over 3358051.30 frames. ], batch size: 43, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:35:41,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=573192.6666666666, ans=0.2 2024-09-24 19:36:05,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.94 vs. limit=10.0 2024-09-24 19:36:34,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=573332.6666666666, ans=0.125 2024-09-24 19:36:45,681 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.270e+02 1.344e+02 1.458e+02 2.417e+02, threshold=2.689e+02, percent-clipped=0.0 2024-09-24 19:36:47,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=573379.3333333334, ans=0.125 2024-09-24 19:36:50,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=573379.3333333334, ans=0.09899494936611666 2024-09-24 19:36:53,624 INFO [train.py:1198] (3/4) Epoch 32, batch 2100, loss[loss=0.2213, ctc_loss=0.1453, cr_loss=0.3802, over 16767.00 frames. ], tot_loss[loss=0.1988, ctc_loss=0.1296, cr_loss=0.3459, over 3351815.19 frames. ], batch size: 61, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:36:58,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=573426.0, ans=0.125 2024-09-24 19:37:17,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=573472.6666666666, ans=0.125 2024-09-24 19:37:26,418 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2024-09-24 19:38:15,963 INFO [train.py:1198] (3/4) Epoch 32, batch 2150, loss[loss=0.1786, ctc_loss=0.1137, cr_loss=0.3244, over 17271.00 frames. ], tot_loss[loss=0.1973, ctc_loss=0.1284, cr_loss=0.3441, over 3360184.72 frames. ], batch size: 44, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:38:25,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=573659.3333333334, ans=0.125 2024-09-24 19:38:35,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=573706.0, ans=0.0 2024-09-24 19:38:56,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2024-09-24 19:38:58,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-09-24 19:39:13,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=573799.3333333334, ans=0.2 2024-09-24 19:39:31,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=573846.0, ans=0.0 2024-09-24 19:39:32,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=573846.0, ans=0.125 2024-09-24 19:39:33,826 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.018e+02 1.275e+02 1.342e+02 1.446e+02 2.934e+02, threshold=2.683e+02, percent-clipped=1.0 2024-09-24 19:39:40,382 INFO [train.py:1198] (3/4) Epoch 32, batch 2200, loss[loss=0.198, ctc_loss=0.126, cr_loss=0.3598, over 17268.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1278, cr_loss=0.3437, over 3366919.25 frames. ], batch size: 42, lr: 3.72e-03, grad_scale: 16.0 2024-09-24 19:39:41,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=22.5 2024-09-24 19:40:02,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=573939.3333333334, ans=0.2 2024-09-24 19:40:06,181 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 19:40:09,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=573939.3333333334, ans=0.1 2024-09-24 19:40:09,935 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2024-09-24 19:40:46,843 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.50 vs. limit=12.0 2024-09-24 19:41:03,503 INFO [train.py:1198] (3/4) Epoch 32, batch 2250, loss[loss=0.222, ctc_loss=0.1479, cr_loss=0.3704, over 17018.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.129, cr_loss=0.3452, over 3352120.85 frames. ], batch size: 56, lr: 3.72e-03, grad_scale: 16.0 2024-09-24 19:41:08,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=574126.0, ans=0.125 2024-09-24 19:41:13,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=574126.0, ans=0.125 2024-09-24 19:41:18,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=574172.6666666666, ans=0.125 2024-09-24 19:41:26,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=22.5 2024-09-24 19:41:29,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=574172.6666666666, ans=0.0 2024-09-24 19:41:40,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=574219.3333333334, ans=0.05 2024-09-24 19:41:59,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=574266.0, ans=0.05 2024-09-24 19:42:00,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=574266.0, ans=0.125 2024-09-24 19:42:16,565 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.263e+02 1.326e+02 1.428e+02 1.960e+02, threshold=2.652e+02, percent-clipped=0.0 2024-09-24 19:42:16,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=574312.6666666666, ans=0.1 2024-09-24 19:42:20,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.28 vs. limit=22.5 2024-09-24 19:42:22,903 INFO [train.py:1198] (3/4) Epoch 32, batch 2300, loss[loss=0.1612, ctc_loss=0.1037, cr_loss=0.287, over 16940.00 frames. ], tot_loss[loss=0.1967, ctc_loss=0.1279, cr_loss=0.3436, over 3363673.73 frames. ], batch size: 42, lr: 3.72e-03, grad_scale: 16.0 2024-09-24 19:43:31,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=574546.0, ans=0.125 2024-09-24 19:43:35,535 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2024-09-24 19:43:36,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=574546.0, ans=0.2 2024-09-24 19:43:36,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=15.0 2024-09-24 19:43:37,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=574546.0, ans=0.1 2024-09-24 19:43:45,788 INFO [train.py:1198] (3/4) Epoch 32, batch 2350, loss[loss=0.2164, ctc_loss=0.1401, cr_loss=0.3815, over 16829.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.1282, cr_loss=0.3453, over 3363347.93 frames. ], batch size: 58, lr: 3.72e-03, grad_scale: 16.0 2024-09-24 19:43:46,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=574592.6666666666, ans=0.125 2024-09-24 19:44:05,407 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 19:44:23,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=574686.0, ans=0.04949747468305833 2024-09-24 19:44:34,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=574686.0, ans=0.125 2024-09-24 19:44:36,627 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=22.5 2024-09-24 19:44:49,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2024-09-24 19:44:57,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=574779.3333333334, ans=0.125 2024-09-24 19:45:04,495 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.259e+02 1.339e+02 1.484e+02 2.121e+02, threshold=2.678e+02, percent-clipped=0.0 2024-09-24 19:45:13,560 INFO [train.py:1198] (3/4) Epoch 32, batch 2400, loss[loss=0.2038, ctc_loss=0.1359, cr_loss=0.3393, over 15783.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1277, cr_loss=0.3445, over 3363631.96 frames. ], batch size: 74, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:45:28,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-09-24 19:46:01,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=574966.0, ans=0.1 2024-09-24 19:46:33,604 INFO [train.py:1198] (3/4) Epoch 32, batch 2450, loss[loss=0.1672, ctc_loss=0.1069, cr_loss=0.3015, over 17262.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1276, cr_loss=0.3446, over 3368206.49 frames. ], batch size: 42, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:46:38,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=575059.3333333334, ans=0.0 2024-09-24 19:46:57,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=575106.0, ans=0.0 2024-09-24 19:47:12,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=575152.6666666666, ans=0.125 2024-09-24 19:47:38,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=575246.0, ans=0.125 2024-09-24 19:47:39,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=12.0 2024-09-24 19:47:49,729 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.250e+02 1.318e+02 1.420e+02 2.123e+02, threshold=2.637e+02, percent-clipped=0.0 2024-09-24 19:47:55,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=575292.6666666666, ans=0.0 2024-09-24 19:47:56,340 INFO [train.py:1198] (3/4) Epoch 32, batch 2500, loss[loss=0.2562, ctc_loss=0.181, cr_loss=0.376, over 11561.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.128, cr_loss=0.3446, over 3351490.78 frames. ], batch size: 124, lr: 3.71e-03, grad_scale: 32.0 2024-09-24 19:48:20,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=575339.3333333334, ans=0.025 2024-09-24 19:48:49,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=575432.6666666666, ans=0.125 2024-09-24 19:49:18,665 INFO [train.py:1198] (3/4) Epoch 32, batch 2550, loss[loss=0.201, ctc_loss=0.1314, cr_loss=0.3481, over 17122.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1273, cr_loss=0.3433, over 3349955.44 frames. ], batch size: 48, lr: 3.71e-03, grad_scale: 32.0 2024-09-24 19:49:21,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=575526.0, ans=0.0 2024-09-24 19:49:29,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=575526.0, ans=0.125 2024-09-24 19:49:53,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=575619.3333333334, ans=0.125 2024-09-24 19:50:13,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=575666.0, ans=0.1 2024-09-24 19:50:20,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=575666.0, ans=0.0 2024-09-24 19:50:26,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=575712.6666666666, ans=0.2 2024-09-24 19:50:36,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=575712.6666666666, ans=0.0 2024-09-24 19:50:39,008 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.044e+02 1.280e+02 1.336e+02 1.446e+02 2.286e+02, threshold=2.672e+02, percent-clipped=0.0 2024-09-24 19:50:43,840 INFO [train.py:1198] (3/4) Epoch 32, batch 2600, loss[loss=0.2135, ctc_loss=0.1412, cr_loss=0.3616, over 17054.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.128, cr_loss=0.3444, over 3342926.64 frames. ], batch size: 56, lr: 3.71e-03, grad_scale: 16.0 2024-09-24 19:51:01,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=575806.0, ans=0.0 2024-09-24 19:51:46,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=575946.0, ans=0.125 2024-09-24 19:51:57,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=575946.0, ans=0.0 2024-09-24 19:52:03,696 INFO [train.py:1198] (3/4) Epoch 32, batch 2650, loss[loss=0.2055, ctc_loss=0.1364, cr_loss=0.3457, over 17139.00 frames. ], tot_loss[loss=0.1975, ctc_loss=0.1285, cr_loss=0.3453, over 3348783.11 frames. ], batch size: 48, lr: 3.71e-03, grad_scale: 16.0 2024-09-24 19:52:36,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.24 vs. limit=22.5 2024-09-24 19:52:44,114 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 19:52:52,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.28 vs. limit=10.0 2024-09-24 19:52:57,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2024-09-24 19:53:16,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=576179.3333333334, ans=0.0 2024-09-24 19:53:21,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=576179.3333333334, ans=0.125 2024-09-24 19:53:22,465 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.282e+02 1.356e+02 1.452e+02 2.905e+02, threshold=2.711e+02, percent-clipped=3.0 2024-09-24 19:53:27,438 INFO [train.py:1198] (3/4) Epoch 32, batch 2700, loss[loss=0.1763, ctc_loss=0.1132, cr_loss=0.3152, over 17093.00 frames. ], tot_loss[loss=0.1979, ctc_loss=0.1287, cr_loss=0.346, over 3353813.06 frames. ], batch size: 43, lr: 3.71e-03, grad_scale: 16.0 2024-09-24 19:53:32,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=576226.0, ans=0.1 2024-09-24 19:53:32,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=12.0 2024-09-24 19:53:38,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=576226.0, ans=0.125 2024-09-24 19:53:51,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=576272.6666666666, ans=0.0 2024-09-24 19:53:53,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=576272.6666666666, ans=0.125 2024-09-24 19:54:20,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=576366.0, ans=0.0 2024-09-24 19:54:50,199 INFO [train.py:1198] (3/4) Epoch 32, batch 2750, loss[loss=0.1982, ctc_loss=0.1281, cr_loss=0.3506, over 17159.00 frames. ], tot_loss[loss=0.1988, ctc_loss=0.1295, cr_loss=0.3468, over 3356646.58 frames. ], batch size: 45, lr: 3.71e-03, grad_scale: 16.0 2024-09-24 19:55:05,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=576506.0, ans=0.05 2024-09-24 19:55:28,552 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2024-09-24 19:55:31,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=576552.6666666666, ans=0.0 2024-09-24 19:55:31,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=576552.6666666666, ans=0.125 2024-09-24 19:55:51,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.73 vs. limit=22.5 2024-09-24 19:56:08,200 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.172e+02 1.294e+02 1.403e+02 1.500e+02 1.947e+02, threshold=2.805e+02, percent-clipped=0.0 2024-09-24 19:56:13,018 INFO [train.py:1198] (3/4) Epoch 32, batch 2800, loss[loss=0.1871, ctc_loss=0.1216, cr_loss=0.3274, over 17265.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1295, cr_loss=0.347, over 3350945.75 frames. ], batch size: 44, lr: 3.71e-03, grad_scale: 32.0 2024-09-24 19:56:34,342 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=22.5 2024-09-24 19:56:58,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=576786.0, ans=0.2 2024-09-24 19:57:16,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=576879.3333333334, ans=0.125 2024-09-24 19:57:16,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=576879.3333333334, ans=12.0 2024-09-24 19:57:36,058 INFO [train.py:1198] (3/4) Epoch 32, batch 2850, loss[loss=0.2134, ctc_loss=0.1404, cr_loss=0.3646, over 16868.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.1291, cr_loss=0.3463, over 3357535.24 frames. ], batch size: 58, lr: 3.71e-03, grad_scale: 32.0 2024-09-24 19:57:43,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=576926.0, ans=15.0 2024-09-24 19:57:49,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=576926.0, ans=0.0 2024-09-24 19:57:57,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=22.5 2024-09-24 19:58:06,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=577019.3333333334, ans=0.1 2024-09-24 19:58:06,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=577019.3333333334, ans=0.125 2024-09-24 19:58:10,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=577019.3333333334, ans=0.0 2024-09-24 19:58:25,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=577066.0, ans=10.0 2024-09-24 19:58:32,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=577066.0, ans=0.1 2024-09-24 19:58:34,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=577066.0, ans=0.0 2024-09-24 19:58:41,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=577112.6666666666, ans=0.125 2024-09-24 19:58:51,150 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.250e+02 1.340e+02 1.459e+02 3.976e+02, threshold=2.680e+02, percent-clipped=1.0 2024-09-24 19:58:55,901 INFO [train.py:1198] (3/4) Epoch 32, batch 2900, loss[loss=0.2015, ctc_loss=0.1322, cr_loss=0.3467, over 17025.00 frames. ], tot_loss[loss=0.1991, ctc_loss=0.1297, cr_loss=0.3472, over 3355826.66 frames. ], batch size: 53, lr: 3.71e-03, grad_scale: 32.0 2024-09-24 19:58:56,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=577159.3333333334, ans=0.125 2024-09-24 19:59:05,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=577159.3333333334, ans=0.125 2024-09-24 19:59:13,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=577206.0, ans=0.125 2024-09-24 19:59:13,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=577206.0, ans=0.2 2024-09-24 19:59:38,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=577252.6666666666, ans=0.125 2024-09-24 19:59:44,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=577252.6666666666, ans=0.125 2024-09-24 19:59:52,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=577299.3333333334, ans=0.1 2024-09-24 19:59:57,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=577299.3333333334, ans=0.125 2024-09-24 20:00:14,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=577346.0, ans=0.125 2024-09-24 20:00:21,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=577392.6666666666, ans=0.125 2024-09-24 20:00:23,211 INFO [train.py:1198] (3/4) Epoch 32, batch 2950, loss[loss=0.2062, ctc_loss=0.1339, cr_loss=0.3613, over 17203.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1296, cr_loss=0.3469, over 3356290.92 frames. ], batch size: 47, lr: 3.71e-03, grad_scale: 32.0 2024-09-24 20:00:32,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=577392.6666666666, ans=0.0 2024-09-24 20:00:32,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=577392.6666666666, ans=0.125 2024-09-24 20:01:39,277 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.244e+02 1.340e+02 1.434e+02 1.849e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-24 20:01:42,522 INFO [train.py:1198] (3/4) Epoch 32, batch 3000, loss[loss=0.1656, ctc_loss=0.1035, cr_loss=0.3109, over 17044.00 frames. ], tot_loss[loss=0.1988, ctc_loss=0.1295, cr_loss=0.3465, over 3359403.12 frames. ], batch size: 39, lr: 3.71e-03, grad_scale: 16.0 2024-09-24 20:01:42,522 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 20:01:57,945 INFO [train.py:1230] (3/4) Epoch 32, validation: loss=0.03608, ctc_loss=0.03608, cr_loss=9.027e-15, over 944034.00 frames. 2024-09-24 20:01:57,945 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 20:01:58,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=577626.0, ans=0.125 2024-09-24 20:02:03,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=577626.0, ans=0.0 2024-09-24 20:02:14,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=577672.6666666666, ans=0.1 2024-09-24 20:02:28,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=577719.3333333334, ans=0.1 2024-09-24 20:02:39,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=577719.3333333334, ans=0.035 2024-09-24 20:03:19,566 INFO [train.py:1198] (3/4) Epoch 32, batch 3050, loss[loss=0.2027, ctc_loss=0.1334, cr_loss=0.3464, over 16874.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1296, cr_loss=0.3462, over 3362984.50 frames. ], batch size: 58, lr: 3.71e-03, grad_scale: 16.0 2024-09-24 20:03:55,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=577952.6666666666, ans=0.2 2024-09-24 20:04:34,696 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.228e+02 1.314e+02 1.465e+02 2.452e+02, threshold=2.627e+02, percent-clipped=0.0 2024-09-24 20:04:37,881 INFO [train.py:1198] (3/4) Epoch 32, batch 3100, loss[loss=0.2116, ctc_loss=0.1379, cr_loss=0.3685, over 17238.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1286, cr_loss=0.3448, over 3371815.99 frames. ], batch size: 50, lr: 3.71e-03, grad_scale: 16.0 2024-09-24 20:04:50,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=578092.6666666666, ans=0.125 2024-09-24 20:05:09,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=578186.0, ans=0.015 2024-09-24 20:05:14,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=578186.0, ans=0.2 2024-09-24 20:05:25,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.62 vs. limit=12.0 2024-09-24 20:05:28,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=578232.6666666666, ans=0.1 2024-09-24 20:05:28,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=578232.6666666666, ans=0.2 2024-09-24 20:05:51,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=578279.3333333334, ans=0.0 2024-09-24 20:05:55,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2024-09-24 20:05:56,196 INFO [train.py:1198] (3/4) Epoch 32, batch 3150, loss[loss=0.2004, ctc_loss=0.1339, cr_loss=0.3325, over 17295.00 frames. ], tot_loss[loss=0.1982, ctc_loss=0.129, cr_loss=0.3456, over 3371020.55 frames. ], batch size: 49, lr: 3.70e-03, grad_scale: 16.0 2024-09-24 20:06:12,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=578372.6666666666, ans=0.1 2024-09-24 20:06:38,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=578419.3333333334, ans=0.2 2024-09-24 20:06:39,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=578419.3333333334, ans=0.125 2024-09-24 20:06:54,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=578466.0, ans=0.2 2024-09-24 20:07:15,659 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.274e+02 1.414e+02 1.514e+02 2.013e+02, threshold=2.828e+02, percent-clipped=0.0 2024-09-24 20:07:17,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=578559.3333333334, ans=0.1 2024-09-24 20:07:18,911 INFO [train.py:1198] (3/4) Epoch 32, batch 3200, loss[loss=0.223, ctc_loss=0.1528, cr_loss=0.3512, over 12535.00 frames. ], tot_loss[loss=0.1981, ctc_loss=0.129, cr_loss=0.3455, over 3367879.41 frames. ], batch size: 125, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:07:20,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=578559.3333333334, ans=0.125 2024-09-24 20:07:22,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=578559.3333333334, ans=0.2 2024-09-24 20:07:26,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=15.0 2024-09-24 20:07:33,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=578606.0, ans=10.0 2024-09-24 20:07:58,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=578652.6666666666, ans=0.125 2024-09-24 20:08:12,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.85 vs. limit=15.0 2024-09-24 20:08:13,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=12.0 2024-09-24 20:08:22,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=578746.0, ans=0.125 2024-09-24 20:08:39,228 INFO [train.py:1198] (3/4) Epoch 32, batch 3250, loss[loss=0.1862, ctc_loss=0.1186, cr_loss=0.3382, over 17085.00 frames. ], tot_loss[loss=0.1986, ctc_loss=0.1294, cr_loss=0.3464, over 3353663.04 frames. ], batch size: 43, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:09:02,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2024-09-24 20:09:29,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=22.5 2024-09-24 20:09:56,536 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.279e+02 1.356e+02 1.447e+02 1.834e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-24 20:09:59,776 INFO [train.py:1198] (3/4) Epoch 32, batch 3300, loss[loss=0.2103, ctc_loss=0.1374, cr_loss=0.3642, over 17217.00 frames. ], tot_loss[loss=0.1988, ctc_loss=0.1295, cr_loss=0.3464, over 3338157.68 frames. ], batch size: 50, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:10:29,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=579119.3333333334, ans=0.1 2024-09-24 20:10:37,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=579119.3333333334, ans=0.125 2024-09-24 20:10:46,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=579166.0, ans=0.125 2024-09-24 20:11:01,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-09-24 20:11:17,460 INFO [train.py:1198] (3/4) Epoch 32, batch 3350, loss[loss=0.193, ctc_loss=0.1266, cr_loss=0.3322, over 16911.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.1291, cr_loss=0.3464, over 3350019.48 frames. ], batch size: 58, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:12:06,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=579399.3333333334, ans=0.125 2024-09-24 20:12:27,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=579446.0, ans=10.0 2024-09-24 20:12:32,489 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.280e+02 1.354e+02 1.467e+02 2.390e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-24 20:12:32,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=579446.0, ans=0.0 2024-09-24 20:12:35,626 INFO [train.py:1198] (3/4) Epoch 32, batch 3400, loss[loss=0.2058, ctc_loss=0.1335, cr_loss=0.3619, over 17013.00 frames. ], tot_loss[loss=0.1975, ctc_loss=0.1285, cr_loss=0.3452, over 3358832.77 frames. ], batch size: 56, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:13:04,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=579539.3333333334, ans=0.025 2024-09-24 20:13:08,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=579586.0, ans=0.125 2024-09-24 20:13:15,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=579586.0, ans=0.0 2024-09-24 20:13:16,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=579586.0, ans=0.125 2024-09-24 20:13:19,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=579586.0, ans=0.125 2024-09-24 20:13:55,925 INFO [train.py:1198] (3/4) Epoch 32, batch 3450, loss[loss=0.1833, ctc_loss=0.1175, cr_loss=0.3292, over 17065.00 frames. ], tot_loss[loss=0.1973, ctc_loss=0.1282, cr_loss=0.3453, over 3365765.18 frames. ], batch size: 46, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:14:05,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=579726.0, ans=0.1 2024-09-24 20:14:07,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=579726.0, ans=0.125 2024-09-24 20:14:23,375 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=15.0 2024-09-24 20:14:47,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=579866.0, ans=0.125 2024-09-24 20:14:53,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=579866.0, ans=0.0 2024-09-24 20:15:10,636 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.299e+02 1.417e+02 1.544e+02 2.266e+02, threshold=2.834e+02, percent-clipped=0.0 2024-09-24 20:15:11,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=579912.6666666666, ans=0.0 2024-09-24 20:15:13,761 INFO [train.py:1198] (3/4) Epoch 32, batch 3500, loss[loss=0.1783, ctc_loss=0.113, cr_loss=0.3265, over 17077.00 frames. ], tot_loss[loss=0.1971, ctc_loss=0.1281, cr_loss=0.3452, over 3370719.10 frames. ], batch size: 46, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:15:15,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=579959.3333333334, ans=0.0 2024-09-24 20:15:28,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=580006.0, ans=0.025 2024-09-24 20:15:39,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=15.0 2024-09-24 20:15:59,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=580099.3333333334, ans=0.0 2024-09-24 20:16:32,169 INFO [train.py:1198] (3/4) Epoch 32, batch 3550, loss[loss=0.2023, ctc_loss=0.1322, cr_loss=0.3508, over 17320.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1285, cr_loss=0.3453, over 3371971.18 frames. ], batch size: 49, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:16:35,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=580192.6666666666, ans=0.0 2024-09-24 20:16:47,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2024-09-24 20:17:01,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=580239.3333333334, ans=0.125 2024-09-24 20:17:12,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=580286.0, ans=0.2 2024-09-24 20:17:28,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=580332.6666666666, ans=0.125 2024-09-24 20:17:50,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=580379.3333333334, ans=0.125 2024-09-24 20:17:51,751 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.264e+02 1.345e+02 1.464e+02 2.321e+02, threshold=2.690e+02, percent-clipped=0.0 2024-09-24 20:17:51,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=580379.3333333334, ans=0.125 2024-09-24 20:17:54,936 INFO [train.py:1198] (3/4) Epoch 32, batch 3600, loss[loss=0.1706, ctc_loss=0.1082, cr_loss=0.3123, over 17087.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1278, cr_loss=0.3441, over 3376310.75 frames. ], batch size: 43, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:18:26,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=580519.3333333334, ans=0.0 2024-09-24 20:18:44,634 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 20:19:00,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 2024-09-24 20:19:01,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=580612.6666666666, ans=0.0 2024-09-24 20:19:15,283 INFO [train.py:1198] (3/4) Epoch 32, batch 3650, loss[loss=0.2057, ctc_loss=0.1345, cr_loss=0.356, over 17147.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.128, cr_loss=0.3445, over 3368340.69 frames. ], batch size: 48, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:19:16,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=580659.3333333334, ans=15.0 2024-09-24 20:19:33,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.51 vs. limit=15.0 2024-09-24 20:19:38,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=580706.0, ans=0.125 2024-09-24 20:19:48,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=580752.6666666666, ans=0.2 2024-09-24 20:19:51,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2024-09-24 20:20:03,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=580799.3333333334, ans=0.0 2024-09-24 20:20:05,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=580799.3333333334, ans=0.1 2024-09-24 20:20:09,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=580799.3333333334, ans=0.07 2024-09-24 20:20:19,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=580846.0, ans=0.125 2024-09-24 20:20:29,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=580846.0, ans=0.125 2024-09-24 20:20:30,686 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.254e+02 1.367e+02 1.508e+02 1.700e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-24 20:20:33,726 INFO [train.py:1198] (3/4) Epoch 32, batch 3700, loss[loss=0.1814, ctc_loss=0.1142, cr_loss=0.3361, over 17279.00 frames. ], tot_loss[loss=0.1971, ctc_loss=0.1282, cr_loss=0.3442, over 3360166.92 frames. ], batch size: 42, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:21:51,885 INFO [train.py:1198] (3/4) Epoch 32, batch 3750, loss[loss=0.1866, ctc_loss=0.1209, cr_loss=0.3286, over 17182.00 frames. ], tot_loss[loss=0.1988, ctc_loss=0.1295, cr_loss=0.3463, over 3355682.97 frames. ], batch size: 45, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:21:52,739 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.73 vs. limit=15.0 2024-09-24 20:22:12,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=581172.6666666666, ans=0.125 2024-09-24 20:22:12,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=581172.6666666666, ans=0.0 2024-09-24 20:22:33,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2024-09-24 20:23:00,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=581312.6666666666, ans=0.1 2024-09-24 20:23:04,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.09 vs. limit=22.5 2024-09-24 20:23:06,880 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.283e+02 1.375e+02 1.482e+02 1.854e+02, threshold=2.749e+02, percent-clipped=0.0 2024-09-24 20:23:10,822 INFO [train.py:1198] (3/4) Epoch 32, batch 3800, loss[loss=0.2127, ctc_loss=0.1445, cr_loss=0.3413, over 11537.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1297, cr_loss=0.3458, over 3309564.68 frames. ], batch size: 123, lr: 3.69e-03, grad_scale: 32.0 2024-09-24 20:23:54,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=581452.6666666666, ans=0.0 2024-09-24 20:24:18,971 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 20:24:21,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2024-09-24 20:24:23,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=581546.0, ans=0.125 2024-09-24 20:24:28,215 INFO [train.py:1198] (3/4) Epoch 32, batch 3850, loss[loss=0.2279, ctc_loss=0.1573, cr_loss=0.353, over 11939.00 frames. ], tot_loss[loss=0.2015, ctc_loss=0.1318, cr_loss=0.3485, over 3252249.09 frames. ], batch size: 123, lr: 3.69e-03, grad_scale: 32.0 2024-09-24 20:24:56,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=581639.3333333334, ans=0.125 2024-09-24 20:25:08,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=581686.0, ans=0.125 2024-09-24 20:25:12,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.52 vs. limit=22.5 2024-09-24 20:26:29,987 INFO [train.py:1198] (3/4) Epoch 33, batch 0, loss[loss=0.2056, ctc_loss=0.1358, cr_loss=0.3488, over 16742.00 frames. ], tot_loss[loss=0.2056, ctc_loss=0.1358, cr_loss=0.3488, over 16742.00 frames. ], batch size: 61, lr: 3.64e-03, grad_scale: 32.0 2024-09-24 20:26:29,988 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 20:26:46,725 INFO [train.py:1230] (3/4) Epoch 33, validation: loss=0.03608, ctc_loss=0.03608, cr_loss=9.001e-15, over 944034.00 frames. 2024-09-24 20:26:46,725 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 20:26:52,643 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.455e+02 1.559e+02 1.655e+02 2.375e+02, threshold=3.119e+02, percent-clipped=0.0 2024-09-24 20:27:20,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=581900.6666666666, ans=0.125 2024-09-24 20:27:42,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=581947.3333333334, ans=0.0 2024-09-24 20:27:44,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=581947.3333333334, ans=0.125 2024-09-24 20:27:55,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=581994.0, ans=0.2 2024-09-24 20:28:01,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=581994.0, ans=0.1 2024-09-24 20:28:09,436 INFO [train.py:1198] (3/4) Epoch 33, batch 50, loss[loss=0.2313, ctc_loss=0.1536, cr_loss=0.3884, over 17010.00 frames. ], tot_loss[loss=0.1996, ctc_loss=0.13, cr_loss=0.3481, over 755082.27 frames. ], batch size: 56, lr: 3.64e-03, grad_scale: 32.0 2024-09-24 20:28:18,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.83 vs. limit=10.0 2024-09-24 20:28:19,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=582040.6666666666, ans=0.5 2024-09-24 20:28:22,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=582040.6666666666, ans=0.125 2024-09-24 20:29:13,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=582180.6666666666, ans=0.125 2024-09-24 20:29:14,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=582227.3333333334, ans=0.0 2024-09-24 20:29:17,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=582227.3333333334, ans=0.1 2024-09-24 20:29:19,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=582227.3333333334, ans=0.1 2024-09-24 20:29:30,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=582274.0, ans=0.04949747468305833 2024-09-24 20:29:31,544 INFO [train.py:1198] (3/4) Epoch 33, batch 100, loss[loss=0.1888, ctc_loss=0.1218, cr_loss=0.3352, over 17115.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.1291, cr_loss=0.3461, over 1316036.82 frames. ], batch size: 40, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:29:33,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=582274.0, ans=0.125 2024-09-24 20:29:34,675 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.274e+02 1.348e+02 1.462e+02 2.671e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-24 20:29:37,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2024-09-24 20:29:38,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=582274.0, ans=0.0 2024-09-24 20:29:44,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=582274.0, ans=0.07 2024-09-24 20:29:59,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=582320.6666666666, ans=0.0 2024-09-24 20:30:26,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=582414.0, ans=0.0 2024-09-24 20:30:28,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=582414.0, ans=0.125 2024-09-24 20:30:54,485 INFO [train.py:1198] (3/4) Epoch 33, batch 150, loss[loss=0.2228, ctc_loss=0.1523, cr_loss=0.3523, over 15907.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.1304, cr_loss=0.3495, over 1766170.88 frames. ], batch size: 74, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:31:10,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.77 vs. limit=15.0 2024-09-24 20:31:23,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=582554.0, ans=0.015 2024-09-24 20:32:01,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=582647.3333333334, ans=0.1 2024-09-24 20:32:04,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=582694.0, ans=0.125 2024-09-24 20:32:06,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=582694.0, ans=0.125 2024-09-24 20:32:20,261 INFO [train.py:1198] (3/4) Epoch 33, batch 200, loss[loss=0.2442, ctc_loss=0.1637, cr_loss=0.4022, over 15291.00 frames. ], tot_loss[loss=0.201, ctc_loss=0.131, cr_loss=0.3498, over 2107517.89 frames. ], batch size: 89, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:32:20,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=582740.6666666666, ans=0.1 2024-09-24 20:32:20,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=582740.6666666666, ans=0.0 2024-09-24 20:32:22,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=582740.6666666666, ans=0.0 2024-09-24 20:32:23,422 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.261e+02 1.350e+02 1.462e+02 5.443e+02, threshold=2.700e+02, percent-clipped=2.0 2024-09-24 20:32:26,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=582740.6666666666, ans=0.125 2024-09-24 20:32:33,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=15.0 2024-09-24 20:32:39,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=582787.3333333334, ans=0.125 2024-09-24 20:32:48,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=22.5 2024-09-24 20:32:49,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=582787.3333333334, ans=0.04949747468305833 2024-09-24 20:32:51,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=582834.0, ans=0.2 2024-09-24 20:33:11,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=582880.6666666666, ans=0.2 2024-09-24 20:33:15,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=22.5 2024-09-24 20:33:19,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=582880.6666666666, ans=0.04949747468305833 2024-09-24 20:33:23,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=15.0 2024-09-24 20:33:42,546 INFO [train.py:1198] (3/4) Epoch 33, batch 250, loss[loss=0.2115, ctc_loss=0.1446, cr_loss=0.3345, over 16000.00 frames. ], tot_loss[loss=0.1977, ctc_loss=0.1284, cr_loss=0.3461, over 2394167.00 frames. ], batch size: 74, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:33:47,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=582974.0, ans=0.1 2024-09-24 20:34:01,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=583020.6666666666, ans=0.0 2024-09-24 20:34:01,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=583020.6666666666, ans=0.2 2024-09-24 20:34:09,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583020.6666666666, ans=0.1 2024-09-24 20:34:30,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2024-09-24 20:35:01,850 INFO [train.py:1198] (3/4) Epoch 33, batch 300, loss[loss=0.1771, ctc_loss=0.1128, cr_loss=0.3214, over 16653.00 frames. ], tot_loss[loss=0.1973, ctc_loss=0.1282, cr_loss=0.3453, over 2611849.47 frames. ], batch size: 37, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:35:05,042 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.274e+02 1.352e+02 1.473e+02 1.783e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-24 20:35:24,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=583254.0, ans=0.035 2024-09-24 20:36:04,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=583347.3333333334, ans=0.125 2024-09-24 20:36:07,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=583394.0, ans=0.2 2024-09-24 20:36:15,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=583394.0, ans=0.125 2024-09-24 20:36:22,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583394.0, ans=0.1 2024-09-24 20:36:25,028 INFO [train.py:1198] (3/4) Epoch 33, batch 350, loss[loss=0.2093, ctc_loss=0.1365, cr_loss=0.3644, over 17295.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.1287, cr_loss=0.3466, over 2772132.60 frames. ], batch size: 46, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:36:57,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=583487.3333333334, ans=0.0 2024-09-24 20:37:05,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=583534.0, ans=0.0 2024-09-24 20:37:09,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-09-24 20:37:10,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=583534.0, ans=0.125 2024-09-24 20:37:41,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2024-09-24 20:37:49,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=583674.0, ans=0.0 2024-09-24 20:37:49,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=583674.0, ans=0.2 2024-09-24 20:37:50,368 INFO [train.py:1198] (3/4) Epoch 33, batch 400, loss[loss=0.2121, ctc_loss=0.1349, cr_loss=0.3862, over 17250.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.128, cr_loss=0.3457, over 2906544.48 frames. ], batch size: 44, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:37:53,545 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.274e+02 1.352e+02 1.470e+02 2.470e+02, threshold=2.705e+02, percent-clipped=0.0 2024-09-24 20:37:53,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=583674.0, ans=0.125 2024-09-24 20:37:55,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=583674.0, ans=0.125 2024-09-24 20:38:08,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=583720.6666666666, ans=0.125 2024-09-24 20:38:09,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=583720.6666666666, ans=0.0 2024-09-24 20:38:14,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=583720.6666666666, ans=0.0 2024-09-24 20:38:50,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=583814.0, ans=0.1 2024-09-24 20:39:12,685 INFO [train.py:1198] (3/4) Epoch 33, batch 450, loss[loss=0.203, ctc_loss=0.1327, cr_loss=0.3514, over 17234.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.129, cr_loss=0.3468, over 3001061.77 frames. ], batch size: 55, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:39:16,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=583907.3333333334, ans=0.125 2024-09-24 20:39:25,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=12.0 2024-09-24 20:39:36,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583954.0, ans=0.1 2024-09-24 20:39:57,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=584000.6666666666, ans=0.2 2024-09-24 20:40:34,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-09-24 20:40:35,212 INFO [train.py:1198] (3/4) Epoch 33, batch 500, loss[loss=0.1959, ctc_loss=0.1296, cr_loss=0.3314, over 16935.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.1281, cr_loss=0.3454, over 3090225.40 frames. ], batch size: 58, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:40:37,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=584140.6666666666, ans=0.015 2024-09-24 20:40:38,396 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.260e+02 1.369e+02 1.441e+02 1.904e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-24 20:41:02,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=584187.3333333334, ans=0.025 2024-09-24 20:41:27,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=584280.6666666666, ans=0.125 2024-09-24 20:41:27,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=584280.6666666666, ans=0.125 2024-09-24 20:41:40,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2024-09-24 20:41:59,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=584374.0, ans=10.0 2024-09-24 20:42:00,693 INFO [train.py:1198] (3/4) Epoch 33, batch 550, loss[loss=0.1953, ctc_loss=0.1284, cr_loss=0.3346, over 17096.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1278, cr_loss=0.3449, over 3157306.26 frames. ], batch size: 49, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:42:05,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.41 vs. limit=5.0 2024-09-24 20:42:12,565 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-24 20:42:25,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-09-24 20:43:00,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=584514.0, ans=0.0 2024-09-24 20:43:13,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=584560.6666666666, ans=0.125 2024-09-24 20:43:20,876 INFO [train.py:1198] (3/4) Epoch 33, batch 600, loss[loss=0.1634, ctc_loss=0.1059, cr_loss=0.2879, over 17050.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.127, cr_loss=0.3436, over 3201824.29 frames. ], batch size: 39, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:43:26,766 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.269e+02 1.358e+02 1.468e+02 2.109e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-24 20:43:33,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=584607.3333333334, ans=0.125 2024-09-24 20:43:33,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=584607.3333333334, ans=0.1 2024-09-24 20:43:44,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=584654.0, ans=0.125 2024-09-24 20:43:59,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=584700.6666666666, ans=0.125 2024-09-24 20:44:10,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=584747.3333333334, ans=0.95 2024-09-24 20:44:17,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=584747.3333333334, ans=0.125 2024-09-24 20:44:23,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=584747.3333333334, ans=0.0 2024-09-24 20:44:28,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=584794.0, ans=0.1 2024-09-24 20:44:43,688 INFO [train.py:1198] (3/4) Epoch 33, batch 650, loss[loss=0.2214, ctc_loss=0.1475, cr_loss=0.3697, over 15996.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.1271, cr_loss=0.3431, over 3230982.29 frames. ], batch size: 74, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:44:48,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=584840.6666666666, ans=0.5 2024-09-24 20:45:00,620 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-09-24 20:45:04,142 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.27 vs. limit=15.0 2024-09-24 20:45:52,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=585027.3333333334, ans=0.125 2024-09-24 20:46:06,206 INFO [train.py:1198] (3/4) Epoch 33, batch 700, loss[loss=0.1827, ctc_loss=0.1214, cr_loss=0.3066, over 17312.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1274, cr_loss=0.3427, over 3264773.08 frames. ], batch size: 46, lr: 3.63e-03, grad_scale: 16.0 2024-09-24 20:46:10,909 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.247e+02 1.323e+02 1.439e+02 2.225e+02, threshold=2.645e+02, percent-clipped=0.0 2024-09-24 20:46:46,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=585167.3333333334, ans=0.0 2024-09-24 20:46:55,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=585167.3333333334, ans=0.1 2024-09-24 20:47:25,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=585260.6666666666, ans=0.125 2024-09-24 20:47:25,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=585260.6666666666, ans=0.125 2024-09-24 20:47:31,642 INFO [train.py:1198] (3/4) Epoch 33, batch 750, loss[loss=0.1656, ctc_loss=0.1073, cr_loss=0.2915, over 17290.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.1274, cr_loss=0.343, over 3291116.26 frames. ], batch size: 42, lr: 3.63e-03, grad_scale: 16.0 2024-09-24 20:47:40,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=12.0 2024-09-24 20:47:44,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=585307.3333333334, ans=0.1 2024-09-24 20:47:54,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=585354.0, ans=0.0 2024-09-24 20:47:55,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=585354.0, ans=0.95 2024-09-24 20:48:33,746 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=15.0 2024-09-24 20:48:53,866 INFO [train.py:1198] (3/4) Epoch 33, batch 800, loss[loss=0.1916, ctc_loss=0.1236, cr_loss=0.34, over 17348.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1277, cr_loss=0.3437, over 3308921.16 frames. ], batch size: 48, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:48:58,603 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.263e+02 1.335e+02 1.404e+02 2.398e+02, threshold=2.669e+02, percent-clipped=0.0 2024-09-24 20:49:08,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=585587.3333333334, ans=0.2 2024-09-24 20:49:24,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=585634.0, ans=0.0 2024-09-24 20:50:04,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585727.3333333334, ans=0.1 2024-09-24 20:50:08,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=585727.3333333334, ans=0.125 2024-09-24 20:50:14,053 INFO [train.py:1198] (3/4) Epoch 33, batch 850, loss[loss=0.2041, ctc_loss=0.1349, cr_loss=0.346, over 17223.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.1271, cr_loss=0.3428, over 3329051.19 frames. ], batch size: 50, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:50:15,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.03 vs. limit=12.0 2024-09-24 20:50:19,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=585774.0, ans=0.125 2024-09-24 20:50:32,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2024-09-24 20:50:33,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=585820.6666666666, ans=0.2 2024-09-24 20:50:33,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2024-09-24 20:50:34,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=22.5 2024-09-24 20:50:40,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.91 vs. limit=12.0 2024-09-24 20:51:39,163 INFO [train.py:1198] (3/4) Epoch 33, batch 900, loss[loss=0.1904, ctc_loss=0.1214, cr_loss=0.345, over 17025.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1269, cr_loss=0.3425, over 3336673.44 frames. ], batch size: 44, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:51:44,053 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.262e+02 1.327e+02 1.436e+02 1.975e+02, threshold=2.653e+02, percent-clipped=0.0 2024-09-24 20:52:10,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2024-09-24 20:52:30,881 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 20:52:39,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=15.0 2024-09-24 20:52:42,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=586194.0, ans=0.025 2024-09-24 20:52:59,227 INFO [train.py:1198] (3/4) Epoch 33, batch 950, loss[loss=0.1703, ctc_loss=0.1081, cr_loss=0.311, over 17277.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1269, cr_loss=0.3427, over 3350995.89 frames. ], batch size: 44, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:52:59,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2024-09-24 20:53:09,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=586240.6666666666, ans=0.0 2024-09-24 20:53:15,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=586287.3333333334, ans=0.125 2024-09-24 20:53:42,029 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 20:54:21,207 INFO [train.py:1198] (3/4) Epoch 33, batch 1000, loss[loss=0.1905, ctc_loss=0.1212, cr_loss=0.3466, over 17250.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1275, cr_loss=0.3443, over 3360722.15 frames. ], batch size: 44, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:54:26,139 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.300e+02 1.389e+02 1.469e+02 2.721e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-24 20:54:30,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.14 vs. limit=15.0 2024-09-24 20:54:34,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=586474.0, ans=0.125 2024-09-24 20:54:53,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=586567.3333333334, ans=0.0 2024-09-24 20:54:54,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2024-09-24 20:55:08,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=586614.0, ans=0.1 2024-09-24 20:55:14,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=586614.0, ans=0.0 2024-09-24 20:55:28,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=15.0 2024-09-24 20:55:44,469 INFO [train.py:1198] (3/4) Epoch 33, batch 1050, loss[loss=0.1911, ctc_loss=0.1265, cr_loss=0.3225, over 17298.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1277, cr_loss=0.3444, over 3360641.35 frames. ], batch size: 49, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:56:09,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=586754.0, ans=0.1 2024-09-24 20:56:21,310 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 20:56:32,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=586800.6666666666, ans=0.05 2024-09-24 20:56:35,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=586847.3333333334, ans=0.125 2024-09-24 20:56:36,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-24 20:56:41,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.46 vs. limit=10.0 2024-09-24 20:56:42,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=586847.3333333334, ans=0.125 2024-09-24 20:56:46,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=586847.3333333334, ans=0.125 2024-09-24 20:56:58,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-09-24 20:57:09,129 INFO [train.py:1198] (3/4) Epoch 33, batch 1100, loss[loss=0.1781, ctc_loss=0.114, cr_loss=0.3205, over 17021.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.1274, cr_loss=0.3436, over 3350445.54 frames. ], batch size: 53, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:57:13,959 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.249e+02 1.328e+02 1.446e+02 1.774e+02, threshold=2.656e+02, percent-clipped=0.0 2024-09-24 20:57:20,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=586940.6666666666, ans=0.0 2024-09-24 20:57:41,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=587034.0, ans=0.025 2024-09-24 20:58:15,434 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 20:58:32,052 INFO [train.py:1198] (3/4) Epoch 33, batch 1150, loss[loss=0.2463, ctc_loss=0.1702, cr_loss=0.3804, over 14962.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1275, cr_loss=0.344, over 3358715.36 frames. ], batch size: 89, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:58:46,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=587220.6666666666, ans=0.2 2024-09-24 20:58:52,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=587220.6666666666, ans=0.125 2024-09-24 20:58:56,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=587220.6666666666, ans=0.04949747468305833 2024-09-24 20:58:59,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=587220.6666666666, ans=0.0 2024-09-24 20:59:16,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=587267.3333333334, ans=0.125 2024-09-24 20:59:23,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=587314.0, ans=0.04949747468305833 2024-09-24 20:59:27,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=587314.0, ans=0.035 2024-09-24 20:59:29,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=587314.0, ans=0.2 2024-09-24 20:59:41,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.89 vs. limit=15.0 2024-09-24 20:59:51,716 INFO [train.py:1198] (3/4) Epoch 33, batch 1200, loss[loss=0.2099, ctc_loss=0.1383, cr_loss=0.3579, over 15999.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1279, cr_loss=0.3452, over 3358996.70 frames. ], batch size: 74, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:59:56,481 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.281e+02 1.363e+02 1.447e+02 2.223e+02, threshold=2.727e+02, percent-clipped=0.0 2024-09-24 21:00:28,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587500.6666666666, ans=0.1 2024-09-24 21:00:51,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=587547.3333333334, ans=0.2 2024-09-24 21:01:13,624 INFO [train.py:1198] (3/4) Epoch 33, batch 1250, loss[loss=0.1621, ctc_loss=0.1028, cr_loss=0.2963, over 17078.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1276, cr_loss=0.3444, over 3342239.53 frames. ], batch size: 40, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 21:01:52,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=587734.0, ans=0.125 2024-09-24 21:01:58,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=587734.0, ans=0.025 2024-09-24 21:02:06,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.56 vs. limit=22.5 2024-09-24 21:02:16,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=587780.6666666666, ans=0.125 2024-09-24 21:02:34,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=587827.3333333334, ans=0.1 2024-09-24 21:02:35,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=587827.3333333334, ans=0.125 2024-09-24 21:02:38,675 INFO [train.py:1198] (3/4) Epoch 33, batch 1300, loss[loss=0.1994, ctc_loss=0.1309, cr_loss=0.343, over 15989.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.128, cr_loss=0.3449, over 3336564.15 frames. ], batch size: 35, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 21:02:41,318 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=22.5 2024-09-24 21:02:43,488 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.271e+02 1.359e+02 1.460e+02 2.870e+02, threshold=2.719e+02, percent-clipped=1.0 2024-09-24 21:03:05,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=587920.6666666666, ans=0.2 2024-09-24 21:03:34,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=588014.0, ans=0.035 2024-09-24 21:03:53,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=588060.6666666666, ans=0.125 2024-09-24 21:04:00,915 INFO [train.py:1198] (3/4) Epoch 33, batch 1350, loss[loss=0.2186, ctc_loss=0.1434, cr_loss=0.3757, over 17046.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.1281, cr_loss=0.3453, over 3340331.09 frames. ], batch size: 52, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 21:04:14,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=588107.3333333334, ans=0.0 2024-09-24 21:04:14,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=588107.3333333334, ans=0.04949747468305833 2024-09-24 21:04:28,618 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:04:34,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=22.5 2024-09-24 21:05:21,446 INFO [train.py:1198] (3/4) Epoch 33, batch 1400, loss[loss=0.2363, ctc_loss=0.1627, cr_loss=0.3683, over 11921.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1274, cr_loss=0.3444, over 3348778.30 frames. ], batch size: 123, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 21:05:25,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=588340.6666666666, ans=0.125 2024-09-24 21:05:26,392 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.268e+02 1.325e+02 1.452e+02 1.895e+02, threshold=2.651e+02, percent-clipped=0.0 2024-09-24 21:05:33,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=588340.6666666666, ans=0.0 2024-09-24 21:05:45,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=588387.3333333334, ans=10.0 2024-09-24 21:05:46,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=588387.3333333334, ans=0.2 2024-09-24 21:05:51,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=588387.3333333334, ans=0.2 2024-09-24 21:06:02,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588434.0, ans=0.1 2024-09-24 21:06:20,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=588480.6666666666, ans=0.0 2024-09-24 21:06:38,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=588527.3333333334, ans=0.025 2024-09-24 21:06:43,227 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:06:49,514 INFO [train.py:1198] (3/4) Epoch 33, batch 1450, loss[loss=0.2162, ctc_loss=0.1444, cr_loss=0.359, over 17283.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.128, cr_loss=0.345, over 3355575.67 frames. ], batch size: 46, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 21:07:01,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=588574.0, ans=0.125 2024-09-24 21:07:02,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=588574.0, ans=0.1 2024-09-24 21:07:21,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=588667.3333333334, ans=0.0 2024-09-24 21:07:45,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=588714.0, ans=0.0 2024-09-24 21:07:56,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-24 21:08:00,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=588760.6666666666, ans=0.125 2024-09-24 21:08:12,483 INFO [train.py:1198] (3/4) Epoch 33, batch 1500, loss[loss=0.1785, ctc_loss=0.1119, cr_loss=0.3333, over 16316.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.128, cr_loss=0.346, over 3365597.29 frames. ], batch size: 36, lr: 3.61e-03, grad_scale: 32.0 2024-09-24 21:08:14,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588807.3333333334, ans=0.1 2024-09-24 21:08:14,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=588807.3333333334, ans=0.0 2024-09-24 21:08:17,327 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.288e+02 1.351e+02 1.447e+02 2.720e+02, threshold=2.702e+02, percent-clipped=1.0 2024-09-24 21:08:36,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=588854.0, ans=0.125 2024-09-24 21:08:40,313 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=22.5 2024-09-24 21:08:40,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.67 vs. limit=10.0 2024-09-24 21:09:23,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=588994.0, ans=0.125 2024-09-24 21:09:32,675 INFO [train.py:1198] (3/4) Epoch 33, batch 1550, loss[loss=0.1991, ctc_loss=0.1283, cr_loss=0.354, over 17296.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1277, cr_loss=0.3456, over 3367927.45 frames. ], batch size: 46, lr: 3.61e-03, grad_scale: 32.0 2024-09-24 21:09:47,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=589087.3333333334, ans=0.0 2024-09-24 21:09:56,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=589087.3333333334, ans=0.125 2024-09-24 21:10:16,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=12.0 2024-09-24 21:10:29,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=12.0 2024-09-24 21:10:43,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2024-09-24 21:10:49,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=589227.3333333334, ans=0.125 2024-09-24 21:10:55,335 INFO [train.py:1198] (3/4) Epoch 33, batch 1600, loss[loss=0.1935, ctc_loss=0.1258, cr_loss=0.3384, over 17295.00 frames. ], tot_loss[loss=0.1961, ctc_loss=0.1272, cr_loss=0.3449, over 3370635.97 frames. ], batch size: 46, lr: 3.61e-03, grad_scale: 32.0 2024-09-24 21:11:00,334 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.046e+02 1.243e+02 1.338e+02 1.455e+02 2.020e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-24 21:11:18,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=589320.6666666666, ans=0.125 2024-09-24 21:11:19,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.97 vs. limit=6.0 2024-09-24 21:11:35,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=589367.3333333334, ans=0.125 2024-09-24 21:11:37,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589367.3333333334, ans=0.1 2024-09-24 21:11:45,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=589367.3333333334, ans=0.0 2024-09-24 21:12:17,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=589460.6666666666, ans=0.2 2024-09-24 21:12:20,304 INFO [train.py:1198] (3/4) Epoch 33, batch 1650, loss[loss=0.1892, ctc_loss=0.1202, cr_loss=0.3451, over 17215.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1274, cr_loss=0.3453, over 3365868.43 frames. ], batch size: 47, lr: 3.61e-03, grad_scale: 16.0 2024-09-24 21:12:36,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=589554.0, ans=0.0 2024-09-24 21:12:39,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=589554.0, ans=0.1 2024-09-24 21:13:14,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=589647.3333333334, ans=0.2 2024-09-24 21:13:30,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=589694.0, ans=0.0 2024-09-24 21:13:43,007 INFO [train.py:1198] (3/4) Epoch 33, batch 1700, loss[loss=0.1849, ctc_loss=0.1181, cr_loss=0.3341, over 17266.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.1278, cr_loss=0.346, over 3358722.44 frames. ], batch size: 44, lr: 3.61e-03, grad_scale: 16.0 2024-09-24 21:13:43,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=589740.6666666666, ans=0.125 2024-09-24 21:13:49,389 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.272e+02 1.348e+02 1.489e+02 2.581e+02, threshold=2.697e+02, percent-clipped=0.0 2024-09-24 21:13:57,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=589787.3333333334, ans=0.125 2024-09-24 21:13:57,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=589787.3333333334, ans=0.2 2024-09-24 21:13:59,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=589787.3333333334, ans=0.125 2024-09-24 21:14:01,120 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:14:01,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=589787.3333333334, ans=0.2 2024-09-24 21:14:41,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=589880.6666666666, ans=0.2 2024-09-24 21:15:02,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=589974.0, ans=0.125 2024-09-24 21:15:03,863 INFO [train.py:1198] (3/4) Epoch 33, batch 1750, loss[loss=0.1958, ctc_loss=0.1294, cr_loss=0.3322, over 17233.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.1271, cr_loss=0.3444, over 3367155.32 frames. ], batch size: 50, lr: 3.61e-03, grad_scale: 16.0 2024-09-24 21:15:19,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=590020.6666666666, ans=0.025 2024-09-24 21:15:23,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=590020.6666666666, ans=0.0 2024-09-24 21:16:03,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=590114.0, ans=0.0 2024-09-24 21:16:31,142 INFO [train.py:1198] (3/4) Epoch 33, batch 1800, loss[loss=0.1873, ctc_loss=0.122, cr_loss=0.3268, over 17040.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.127, cr_loss=0.3446, over 3373569.89 frames. ], batch size: 52, lr: 3.61e-03, grad_scale: 16.0 2024-09-24 21:16:37,338 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.268e+02 1.341e+02 1.421e+02 2.295e+02, threshold=2.681e+02, percent-clipped=0.0 2024-09-24 21:16:40,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=590207.3333333334, ans=0.5 2024-09-24 21:16:58,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=590254.0, ans=0.125 2024-09-24 21:17:03,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=590300.6666666666, ans=0.125 2024-09-24 21:17:37,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.57 vs. limit=10.0 2024-09-24 21:17:50,742 INFO [train.py:1198] (3/4) Epoch 33, batch 1850, loss[loss=0.2593, ctc_loss=0.1782, cr_loss=0.4054, over 12224.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1278, cr_loss=0.3457, over 3364825.84 frames. ], batch size: 123, lr: 3.61e-03, grad_scale: 16.0 2024-09-24 21:18:09,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=590487.3333333334, ans=0.025 2024-09-24 21:18:11,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=22.5 2024-09-24 21:18:41,580 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:19:13,090 INFO [train.py:1198] (3/4) Epoch 33, batch 1900, loss[loss=0.2121, ctc_loss=0.1389, cr_loss=0.366, over 17291.00 frames. ], tot_loss[loss=0.1975, ctc_loss=0.1282, cr_loss=0.3465, over 3361911.54 frames. ], batch size: 46, lr: 3.61e-03, grad_scale: 16.0 2024-09-24 21:19:14,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=590674.0, ans=0.2 2024-09-24 21:19:18,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.35 vs. limit=10.0 2024-09-24 21:19:19,405 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.258e+02 1.359e+02 1.471e+02 2.658e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-24 21:19:56,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=590767.3333333334, ans=0.125 2024-09-24 21:20:01,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=590814.0, ans=0.05 2024-09-24 21:20:19,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=590860.6666666666, ans=0.2 2024-09-24 21:20:33,377 INFO [train.py:1198] (3/4) Epoch 33, batch 1950, loss[loss=0.205, ctc_loss=0.1318, cr_loss=0.3662, over 17135.00 frames. ], tot_loss[loss=0.1971, ctc_loss=0.128, cr_loss=0.3455, over 3352712.58 frames. ], batch size: 48, lr: 3.61e-03, grad_scale: 8.0 2024-09-24 21:20:45,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=590907.3333333334, ans=0.125 2024-09-24 21:21:09,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=591000.6666666666, ans=0.1 2024-09-24 21:21:28,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=591047.3333333334, ans=0.0 2024-09-24 21:21:29,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=591047.3333333334, ans=0.0 2024-09-24 21:21:57,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=591094.0, ans=0.025 2024-09-24 21:22:00,620 INFO [train.py:1198] (3/4) Epoch 33, batch 2000, loss[loss=0.2117, ctc_loss=0.1413, cr_loss=0.352, over 17290.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1276, cr_loss=0.3446, over 3363666.32 frames. ], batch size: 49, lr: 3.61e-03, grad_scale: 16.0 2024-09-24 21:22:01,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=591140.6666666666, ans=0.125 2024-09-24 21:22:08,625 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.300e+02 1.359e+02 1.463e+02 1.672e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-24 21:22:16,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=591187.3333333334, ans=0.1 2024-09-24 21:22:18,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=591187.3333333334, ans=0.0 2024-09-24 21:22:26,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=22.5 2024-09-24 21:22:41,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.30 vs. limit=15.0 2024-09-24 21:22:53,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=22.5 2024-09-24 21:22:56,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=591280.6666666666, ans=0.07 2024-09-24 21:22:58,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=22.5 2024-09-24 21:23:22,762 INFO [train.py:1198] (3/4) Epoch 33, batch 2050, loss[loss=0.1825, ctc_loss=0.116, cr_loss=0.3325, over 17349.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1276, cr_loss=0.3449, over 3375634.54 frames. ], batch size: 48, lr: 3.61e-03, grad_scale: 8.0 2024-09-24 21:23:26,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=591374.0, ans=0.0 2024-09-24 21:23:29,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2024-09-24 21:23:34,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=591374.0, ans=0.025 2024-09-24 21:23:55,320 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:24:00,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=591467.3333333334, ans=0.125 2024-09-24 21:24:17,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=591514.0, ans=0.125 2024-09-24 21:24:42,774 INFO [train.py:1198] (3/4) Epoch 33, batch 2100, loss[loss=0.213, ctc_loss=0.1384, cr_loss=0.3733, over 17213.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1275, cr_loss=0.3448, over 3377052.45 frames. ], batch size: 47, lr: 3.61e-03, grad_scale: 8.0 2024-09-24 21:24:43,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=591607.3333333334, ans=0.125 2024-09-24 21:24:51,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=591607.3333333334, ans=0.5 2024-09-24 21:24:52,451 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.286e+02 1.370e+02 1.480e+02 2.165e+02, threshold=2.740e+02, percent-clipped=0.0 2024-09-24 21:24:56,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2024-09-24 21:26:05,920 INFO [train.py:1198] (3/4) Epoch 33, batch 2150, loss[loss=0.2083, ctc_loss=0.1368, cr_loss=0.3578, over 16990.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1276, cr_loss=0.3445, over 3372926.05 frames. ], batch size: 53, lr: 3.61e-03, grad_scale: 8.0 2024-09-24 21:26:17,542 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=12.0 2024-09-24 21:26:29,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=591887.3333333334, ans=0.1 2024-09-24 21:27:00,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2024-09-24 21:27:04,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=591980.6666666666, ans=0.125 2024-09-24 21:27:16,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=592027.3333333334, ans=0.025 2024-09-24 21:27:30,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=592074.0, ans=0.0 2024-09-24 21:27:32,260 INFO [train.py:1198] (3/4) Epoch 33, batch 2200, loss[loss=0.1802, ctc_loss=0.1167, cr_loss=0.3174, over 17211.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1271, cr_loss=0.344, over 3373566.41 frames. ], batch size: 41, lr: 3.60e-03, grad_scale: 8.0 2024-09-24 21:27:36,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.37 vs. limit=15.0 2024-09-24 21:27:41,951 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.250e+02 1.335e+02 1.414e+02 1.894e+02, threshold=2.669e+02, percent-clipped=0.0 2024-09-24 21:27:45,999 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2024-09-24 21:27:52,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=592120.6666666666, ans=0.125 2024-09-24 21:27:52,086 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:27:52,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=592120.6666666666, ans=0.125 2024-09-24 21:28:17,077 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:28:37,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=592260.6666666666, ans=0.125 2024-09-24 21:28:40,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=592260.6666666666, ans=0.125 2024-09-24 21:28:43,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=592260.6666666666, ans=0.125 2024-09-24 21:28:54,691 INFO [train.py:1198] (3/4) Epoch 33, batch 2250, loss[loss=0.2011, ctc_loss=0.1317, cr_loss=0.3472, over 17308.00 frames. ], tot_loss[loss=0.1967, ctc_loss=0.1277, cr_loss=0.3446, over 3360002.01 frames. ], batch size: 51, lr: 3.60e-03, grad_scale: 8.0 2024-09-24 21:29:08,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.85 vs. limit=15.0 2024-09-24 21:29:37,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2024-09-24 21:29:45,121 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:29:57,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=592494.0, ans=0.2 2024-09-24 21:30:14,801 INFO [train.py:1198] (3/4) Epoch 33, batch 2300, loss[loss=0.2021, ctc_loss=0.1304, cr_loss=0.3583, over 17018.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.1281, cr_loss=0.3454, over 3364862.36 frames. ], batch size: 44, lr: 3.60e-03, grad_scale: 8.0 2024-09-24 21:30:16,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=592540.6666666666, ans=0.0 2024-09-24 21:30:24,330 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.313e+02 1.398e+02 1.523e+02 4.380e+02, threshold=2.797e+02, percent-clipped=1.0 2024-09-24 21:30:33,075 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-24 21:30:56,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=592634.0, ans=0.0 2024-09-24 21:31:08,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=592680.6666666666, ans=0.035 2024-09-24 21:31:24,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=592680.6666666666, ans=0.125 2024-09-24 21:31:29,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2024-09-24 21:31:38,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=592727.3333333334, ans=0.05 2024-09-24 21:31:43,091 INFO [train.py:1198] (3/4) Epoch 33, batch 2350, loss[loss=0.245, ctc_loss=0.17, cr_loss=0.375, over 11736.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.128, cr_loss=0.3451, over 3351571.61 frames. ], batch size: 123, lr: 3.60e-03, grad_scale: 8.0 2024-09-24 21:32:18,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=592867.3333333334, ans=0.1 2024-09-24 21:32:38,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=592914.0, ans=0.1 2024-09-24 21:32:39,384 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=15.0 2024-09-24 21:32:52,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=592960.6666666666, ans=0.2 2024-09-24 21:33:05,028 INFO [train.py:1198] (3/4) Epoch 33, batch 2400, loss[loss=0.2242, ctc_loss=0.1571, cr_loss=0.3355, over 11275.00 frames. ], tot_loss[loss=0.1971, ctc_loss=0.1281, cr_loss=0.3452, over 3337272.96 frames. ], batch size: 123, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:33:09,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.50 vs. limit=12.0 2024-09-24 21:33:14,696 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.266e+02 1.359e+02 1.474e+02 2.316e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-24 21:33:15,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=593007.3333333334, ans=0.0 2024-09-24 21:33:24,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=593054.0, ans=0.125 2024-09-24 21:34:24,928 INFO [train.py:1198] (3/4) Epoch 33, batch 2450, loss[loss=0.2357, ctc_loss=0.1571, cr_loss=0.393, over 12036.00 frames. ], tot_loss[loss=0.1967, ctc_loss=0.1277, cr_loss=0.3446, over 3330472.08 frames. ], batch size: 123, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:34:41,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=15.0 2024-09-24 21:34:42,799 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:34:44,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=593287.3333333334, ans=0.2 2024-09-24 21:34:52,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=593287.3333333334, ans=0.125 2024-09-24 21:35:01,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=593334.0, ans=0.0 2024-09-24 21:35:25,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=593380.6666666666, ans=0.125 2024-09-24 21:35:39,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=593427.3333333334, ans=0.125 2024-09-24 21:35:44,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=593427.3333333334, ans=0.1 2024-09-24 21:35:48,034 INFO [train.py:1198] (3/4) Epoch 33, batch 2500, loss[loss=0.1714, ctc_loss=0.1079, cr_loss=0.3175, over 16723.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1279, cr_loss=0.3444, over 3337227.09 frames. ], batch size: 37, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:36:00,299 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.280e+02 1.356e+02 1.432e+02 1.776e+02, threshold=2.712e+02, percent-clipped=0.0 2024-09-24 21:36:23,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=593567.3333333334, ans=0.2 2024-09-24 21:36:42,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=593614.0, ans=15.0 2024-09-24 21:37:08,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2024-09-24 21:37:13,488 INFO [train.py:1198] (3/4) Epoch 33, batch 2550, loss[loss=0.2222, ctc_loss=0.1453, cr_loss=0.3847, over 16619.00 frames. ], tot_loss[loss=0.1967, ctc_loss=0.1279, cr_loss=0.3442, over 3335872.51 frames. ], batch size: 66, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:37:23,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=593707.3333333334, ans=0.1 2024-09-24 21:37:24,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=593707.3333333334, ans=0.1 2024-09-24 21:37:40,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=593754.0, ans=0.125 2024-09-24 21:37:49,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=593800.6666666666, ans=0.5 2024-09-24 21:37:52,015 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.06 vs. limit=12.0 2024-09-24 21:37:52,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=593800.6666666666, ans=0.1 2024-09-24 21:37:57,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=593800.6666666666, ans=0.1 2024-09-24 21:38:35,209 INFO [train.py:1198] (3/4) Epoch 33, batch 2600, loss[loss=0.2437, ctc_loss=0.1723, cr_loss=0.3569, over 12105.00 frames. ], tot_loss[loss=0.1974, ctc_loss=0.1283, cr_loss=0.3452, over 3325801.19 frames. ], batch size: 123, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:38:35,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=593940.6666666666, ans=0.125 2024-09-24 21:38:44,724 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.263e+02 1.356e+02 1.473e+02 2.021e+02, threshold=2.712e+02, percent-clipped=0.0 2024-09-24 21:39:00,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=593987.3333333334, ans=0.2 2024-09-24 21:39:45,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=594127.3333333334, ans=0.125 2024-09-24 21:39:54,549 INFO [train.py:1198] (3/4) Epoch 33, batch 2650, loss[loss=0.2023, ctc_loss=0.1313, cr_loss=0.3547, over 16914.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.1281, cr_loss=0.3453, over 3333250.96 frames. ], batch size: 58, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:40:02,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=594174.0, ans=0.2 2024-09-24 21:40:11,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.64 vs. limit=6.0 2024-09-24 21:40:30,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=594267.3333333334, ans=0.1 2024-09-24 21:40:35,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=594267.3333333334, ans=0.125 2024-09-24 21:40:54,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=594314.0, ans=0.0 2024-09-24 21:41:07,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=594360.6666666666, ans=0.125 2024-09-24 21:41:14,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=594360.6666666666, ans=0.1 2024-09-24 21:41:22,175 INFO [train.py:1198] (3/4) Epoch 33, batch 2700, loss[loss=0.175, ctc_loss=0.1124, cr_loss=0.3129, over 17310.00 frames. ], tot_loss[loss=0.1961, ctc_loss=0.1274, cr_loss=0.3435, over 3337971.10 frames. ], batch size: 51, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:41:31,684 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.245e+02 1.329e+02 1.418e+02 2.018e+02, threshold=2.657e+02, percent-clipped=0.0 2024-09-24 21:41:41,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=594454.0, ans=0.125 2024-09-24 21:41:46,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-09-24 21:42:13,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=594547.3333333334, ans=0.125 2024-09-24 21:42:44,870 INFO [train.py:1198] (3/4) Epoch 33, batch 2750, loss[loss=0.2032, ctc_loss=0.1344, cr_loss=0.3443, over 16977.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.128, cr_loss=0.3446, over 3339245.63 frames. ], batch size: 56, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:43:17,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=594734.0, ans=0.125 2024-09-24 21:43:24,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.89 vs. limit=15.0 2024-09-24 21:43:28,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=594734.0, ans=0.0 2024-09-24 21:43:41,711 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.22 vs. limit=15.0 2024-09-24 21:43:43,055 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:44:04,860 INFO [train.py:1198] (3/4) Epoch 33, batch 2800, loss[loss=0.189, ctc_loss=0.1213, cr_loss=0.3386, over 17299.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1272, cr_loss=0.3435, over 3348154.31 frames. ], batch size: 46, lr: 3.60e-03, grad_scale: 32.0 2024-09-24 21:44:14,278 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.285e+02 1.356e+02 1.483e+02 1.883e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-24 21:44:24,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=594920.6666666666, ans=0.125 2024-09-24 21:44:36,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=594967.3333333334, ans=0.125 2024-09-24 21:44:38,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=594967.3333333334, ans=0.125 2024-09-24 21:44:59,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=595014.0, ans=0.125 2024-09-24 21:45:24,681 INFO [train.py:1198] (3/4) Epoch 33, batch 2850, loss[loss=0.1795, ctc_loss=0.1134, cr_loss=0.3306, over 17071.00 frames. ], tot_loss[loss=0.1961, ctc_loss=0.1274, cr_loss=0.3436, over 3340730.31 frames. ], batch size: 46, lr: 3.60e-03, grad_scale: 32.0 2024-09-24 21:45:30,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=595107.3333333334, ans=0.125 2024-09-24 21:45:30,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595107.3333333334, ans=0.1 2024-09-24 21:45:32,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=595107.3333333334, ans=0.0 2024-09-24 21:45:32,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=595107.3333333334, ans=0.05 2024-09-24 21:45:36,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=595107.3333333334, ans=0.125 2024-09-24 21:45:43,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=595154.0, ans=15.0 2024-09-24 21:45:45,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=595154.0, ans=0.0 2024-09-24 21:45:49,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=12.0 2024-09-24 21:46:01,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=595154.0, ans=0.07 2024-09-24 21:46:45,810 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:46:52,042 INFO [train.py:1198] (3/4) Epoch 33, batch 2900, loss[loss=0.2179, ctc_loss=0.1436, cr_loss=0.3714, over 16543.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1271, cr_loss=0.3441, over 3354661.84 frames. ], batch size: 66, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:46:52,359 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:46:57,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=595340.6666666666, ans=0.125 2024-09-24 21:47:00,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=595340.6666666666, ans=0.125 2024-09-24 21:47:01,746 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.278e+02 1.378e+02 1.507e+02 2.269e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-24 21:47:22,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=595434.0, ans=0.2 2024-09-24 21:47:32,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=595434.0, ans=0.125 2024-09-24 21:47:34,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=595434.0, ans=0.2 2024-09-24 21:47:43,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=595480.6666666666, ans=0.025 2024-09-24 21:47:48,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=595480.6666666666, ans=0.025 2024-09-24 21:47:54,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=595480.6666666666, ans=0.0 2024-09-24 21:47:57,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=15.0 2024-09-24 21:48:04,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=595527.3333333334, ans=0.125 2024-09-24 21:48:06,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=22.5 2024-09-24 21:48:12,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=595527.3333333334, ans=0.125 2024-09-24 21:48:15,325 INFO [train.py:1198] (3/4) Epoch 33, batch 2950, loss[loss=0.1897, ctc_loss=0.1214, cr_loss=0.3417, over 17007.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.127, cr_loss=0.3439, over 3350675.49 frames. ], batch size: 44, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:48:31,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=595620.6666666666, ans=0.125 2024-09-24 21:48:34,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=595620.6666666666, ans=0.125 2024-09-24 21:49:05,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=595714.0, ans=0.05 2024-09-24 21:49:19,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=595760.6666666666, ans=0.07 2024-09-24 21:49:26,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=595760.6666666666, ans=0.125 2024-09-24 21:49:28,491 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:49:28,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=595760.6666666666, ans=0.1 2024-09-24 21:49:34,547 INFO [train.py:1198] (3/4) Epoch 33, batch 3000, loss[loss=0.1659, ctc_loss=0.1051, cr_loss=0.304, over 17211.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1268, cr_loss=0.3435, over 3355941.62 frames. ], batch size: 47, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:49:34,548 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 21:49:47,683 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5044, 4.2848, 4.1915, 4.0418], device='cuda:3') 2024-09-24 21:49:50,274 INFO [train.py:1230] (3/4) Epoch 33, validation: loss=0.03597, ctc_loss=0.03597, cr_loss=9.382e-15, over 944034.00 frames. 2024-09-24 21:49:50,274 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 21:49:59,653 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.292e+02 1.353e+02 1.495e+02 2.152e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-24 21:50:03,819 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=22.5 2024-09-24 21:50:16,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=12.0 2024-09-24 21:50:38,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2024-09-24 21:50:54,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=595994.0, ans=0.0 2024-09-24 21:51:07,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=596040.6666666666, ans=0.2 2024-09-24 21:51:08,636 INFO [train.py:1198] (3/4) Epoch 33, batch 3050, loss[loss=0.1718, ctc_loss=0.1101, cr_loss=0.3085, over 17055.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1266, cr_loss=0.3423, over 3353328.98 frames. ], batch size: 39, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:51:39,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=596087.3333333334, ans=0.0 2024-09-24 21:51:45,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=596134.0, ans=0.09899494936611666 2024-09-24 21:51:53,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=596134.0, ans=0.025 2024-09-24 21:51:53,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596134.0, ans=0.1 2024-09-24 21:52:09,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=596180.6666666666, ans=0.125 2024-09-24 21:52:13,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=596227.3333333334, ans=0.2 2024-09-24 21:52:31,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=596227.3333333334, ans=0.0 2024-09-24 21:52:34,218 INFO [train.py:1198] (3/4) Epoch 33, batch 3100, loss[loss=0.1606, ctc_loss=0.1002, cr_loss=0.3023, over 16932.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1268, cr_loss=0.3432, over 3357828.25 frames. ], batch size: 42, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:52:43,659 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.278e+02 1.346e+02 1.474e+02 1.974e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-24 21:52:50,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=596320.6666666666, ans=0.125 2024-09-24 21:52:54,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596320.6666666666, ans=0.1 2024-09-24 21:52:55,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=596320.6666666666, ans=0.125 2024-09-24 21:52:58,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2024-09-24 21:53:01,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=596320.6666666666, ans=0.0 2024-09-24 21:53:02,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=596320.6666666666, ans=0.0 2024-09-24 21:53:09,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=596367.3333333334, ans=0.125 2024-09-24 21:53:12,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=22.5 2024-09-24 21:53:37,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=596460.6666666666, ans=0.0 2024-09-24 21:53:53,434 INFO [train.py:1198] (3/4) Epoch 33, batch 3150, loss[loss=0.1858, ctc_loss=0.1216, cr_loss=0.3212, over 17019.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.1269, cr_loss=0.3439, over 3365965.94 frames. ], batch size: 44, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:53:57,041 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:54:06,937 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=12.0 2024-09-24 21:54:14,388 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:54:31,561 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:54:51,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=596647.3333333334, ans=0.0 2024-09-24 21:55:11,610 INFO [train.py:1198] (3/4) Epoch 33, batch 3200, loss[loss=0.201, ctc_loss=0.1322, cr_loss=0.3439, over 17004.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1265, cr_loss=0.3437, over 3368108.15 frames. ], batch size: 56, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:55:13,606 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:55:21,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=596740.6666666666, ans=0.125 2024-09-24 21:55:22,509 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.266e+02 1.354e+02 1.465e+02 1.945e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-24 21:55:24,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=596740.6666666666, ans=0.125 2024-09-24 21:55:29,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2024-09-24 21:55:35,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=596787.3333333334, ans=0.125 2024-09-24 21:55:38,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596787.3333333334, ans=0.1 2024-09-24 21:55:43,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=596834.0, ans=0.04949747468305833 2024-09-24 21:56:02,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.70 vs. limit=15.0 2024-09-24 21:56:31,847 INFO [train.py:1198] (3/4) Epoch 33, batch 3250, loss[loss=0.2077, ctc_loss=0.1327, cr_loss=0.3749, over 17083.00 frames. ], tot_loss[loss=0.1977, ctc_loss=0.1283, cr_loss=0.3469, over 3356010.02 frames. ], batch size: 43, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:56:32,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=596974.0, ans=0.0 2024-09-24 21:56:32,584 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2024-09-24 21:56:33,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.79 vs. limit=5.0 2024-09-24 21:56:33,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=596974.0, ans=0.2 2024-09-24 21:56:38,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.78 vs. limit=10.0 2024-09-24 21:56:43,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=596974.0, ans=0.0 2024-09-24 21:56:46,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=597020.6666666666, ans=0.125 2024-09-24 21:56:48,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=597020.6666666666, ans=0.2 2024-09-24 21:57:02,418 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2024-09-24 21:57:05,761 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2024-09-24 21:57:34,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=597160.6666666666, ans=0.125 2024-09-24 21:57:50,338 INFO [train.py:1198] (3/4) Epoch 33, batch 3300, loss[loss=0.1976, ctc_loss=0.1285, cr_loss=0.3454, over 16992.00 frames. ], tot_loss[loss=0.1973, ctc_loss=0.1281, cr_loss=0.3462, over 3363472.90 frames. ], batch size: 53, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:57:52,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=597207.3333333334, ans=0.0 2024-09-24 21:57:53,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=597207.3333333334, ans=0.2 2024-09-24 21:58:01,413 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.317e+02 1.415e+02 1.516e+02 3.233e+02, threshold=2.830e+02, percent-clipped=1.0 2024-09-24 21:58:14,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.24 vs. limit=22.5 2024-09-24 21:59:03,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=597394.0, ans=0.125 2024-09-24 21:59:06,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=597394.0, ans=0.125 2024-09-24 21:59:11,414 INFO [train.py:1198] (3/4) Epoch 33, batch 3350, loss[loss=0.1803, ctc_loss=0.1116, cr_loss=0.3437, over 17153.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.127, cr_loss=0.3442, over 3362078.99 frames. ], batch size: 41, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 22:00:00,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=597580.6666666666, ans=0.1 2024-09-24 22:00:29,916 INFO [train.py:1198] (3/4) Epoch 33, batch 3400, loss[loss=0.21, ctc_loss=0.1359, cr_loss=0.3703, over 17044.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.1272, cr_loss=0.3445, over 3357577.42 frames. ], batch size: 52, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 22:00:30,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.75 vs. limit=10.0 2024-09-24 22:00:40,768 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.046e+02 1.249e+02 1.331e+02 1.438e+02 2.060e+02, threshold=2.663e+02, percent-clipped=0.0 2024-09-24 22:00:44,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2024-09-24 22:01:01,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=597767.3333333334, ans=0.0 2024-09-24 22:01:05,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=597767.3333333334, ans=0.1 2024-09-24 22:01:31,409 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 22:01:48,349 INFO [train.py:1198] (3/4) Epoch 33, batch 3450, loss[loss=0.2393, ctc_loss=0.1558, cr_loss=0.4177, over 17212.00 frames. ], tot_loss[loss=0.1977, ctc_loss=0.1285, cr_loss=0.3464, over 3337895.93 frames. ], batch size: 55, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 22:02:04,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=597954.0, ans=0.125 2024-09-24 22:02:16,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=597954.0, ans=0.0 2024-09-24 22:02:18,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=597954.0, ans=0.125 2024-09-24 22:02:35,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=598000.6666666666, ans=0.125 2024-09-24 22:02:36,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=598000.6666666666, ans=0.125 2024-09-24 22:03:04,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=598094.0, ans=0.0 2024-09-24 22:03:09,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=598094.0, ans=0.1 2024-09-24 22:03:12,225 INFO [train.py:1198] (3/4) Epoch 33, batch 3500, loss[loss=0.1892, ctc_loss=0.1203, cr_loss=0.3442, over 17064.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1284, cr_loss=0.346, over 3340254.84 frames. ], batch size: 46, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 22:03:23,015 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.259e+02 1.352e+02 1.466e+02 4.097e+02, threshold=2.703e+02, percent-clipped=1.0 2024-09-24 22:03:26,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=598187.3333333334, ans=0.125 2024-09-24 22:03:30,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2024-09-24 22:03:39,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=598187.3333333334, ans=0.125 2024-09-24 22:03:46,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=598234.0, ans=0.09899494936611666 2024-09-24 22:03:48,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=598234.0, ans=0.0 2024-09-24 22:04:05,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=598280.6666666666, ans=0.0 2024-09-24 22:04:27,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=598327.3333333334, ans=0.125 2024-09-24 22:04:29,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=598374.0, ans=0.125 2024-09-24 22:04:30,426 INFO [train.py:1198] (3/4) Epoch 33, batch 3550, loss[loss=0.2144, ctc_loss=0.1432, cr_loss=0.3559, over 16441.00 frames. ], tot_loss[loss=0.1978, ctc_loss=0.1286, cr_loss=0.3461, over 3344783.90 frames. ], batch size: 66, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 22:04:57,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=598420.6666666666, ans=0.125 2024-09-24 22:05:00,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=598467.3333333334, ans=0.0 2024-09-24 22:05:03,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=598467.3333333334, ans=0.0 2024-09-24 22:05:17,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=598514.0, ans=0.0 2024-09-24 22:05:20,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=598514.0, ans=0.125 2024-09-24 22:05:39,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=598560.6666666666, ans=0.125 2024-09-24 22:05:48,778 INFO [train.py:1198] (3/4) Epoch 33, batch 3600, loss[loss=0.1774, ctc_loss=0.1145, cr_loss=0.3146, over 17324.00 frames. ], tot_loss[loss=0.1981, ctc_loss=0.1288, cr_loss=0.3467, over 3344303.93 frames. ], batch size: 51, lr: 3.58e-03, grad_scale: 32.0 2024-09-24 22:05:59,716 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.266e+02 1.339e+02 1.451e+02 1.959e+02, threshold=2.678e+02, percent-clipped=0.0 2024-09-24 22:06:06,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598654.0, ans=0.1 2024-09-24 22:06:33,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-09-24 22:07:07,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=598840.6666666666, ans=0.2 2024-09-24 22:07:08,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.10 vs. limit=15.0 2024-09-24 22:07:08,223 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.83 vs. limit=15.0 2024-09-24 22:07:08,980 INFO [train.py:1198] (3/4) Epoch 33, batch 3650, loss[loss=0.1953, ctc_loss=0.1274, cr_loss=0.3394, over 17303.00 frames. ], tot_loss[loss=0.1977, ctc_loss=0.1284, cr_loss=0.3465, over 3344706.86 frames. ], batch size: 51, lr: 3.58e-03, grad_scale: 32.0 2024-09-24 22:07:34,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598887.3333333334, ans=0.1 2024-09-24 22:07:37,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.35 vs. limit=10.0 2024-09-24 22:07:39,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-09-24 22:08:23,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=599027.3333333334, ans=0.0 2024-09-24 22:08:27,970 INFO [train.py:1198] (3/4) Epoch 33, batch 3700, loss[loss=0.1717, ctc_loss=0.1106, cr_loss=0.3055, over 17030.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1277, cr_loss=0.3455, over 3355362.51 frames. ], batch size: 39, lr: 3.58e-03, grad_scale: 32.0 2024-09-24 22:08:33,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2024-09-24 22:08:34,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=599074.0, ans=0.1 2024-09-24 22:08:39,002 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.282e+02 1.323e+02 1.479e+02 2.496e+02, threshold=2.646e+02, percent-clipped=0.0 2024-09-24 22:08:50,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2024-09-24 22:08:51,551 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 22:09:02,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=599167.3333333334, ans=0.05 2024-09-24 22:09:12,893 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.80 vs. limit=15.0 2024-09-24 22:09:36,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=599260.6666666666, ans=0.025 2024-09-24 22:09:41,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=599260.6666666666, ans=0.0 2024-09-24 22:09:45,899 INFO [train.py:1198] (3/4) Epoch 33, batch 3750, loss[loss=0.2003, ctc_loss=0.1316, cr_loss=0.3436, over 17144.00 frames. ], tot_loss[loss=0.1973, ctc_loss=0.1281, cr_loss=0.3462, over 3351819.04 frames. ], batch size: 48, lr: 3.58e-03, grad_scale: 32.0 2024-09-24 22:09:48,331 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-24 22:10:35,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=599447.3333333334, ans=0.1 2024-09-24 22:11:03,990 INFO [train.py:1198] (3/4) Epoch 33, batch 3800, loss[loss=0.2249, ctc_loss=0.1478, cr_loss=0.3857, over 15141.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1294, cr_loss=0.348, over 3331028.27 frames. ], batch size: 89, lr: 3.58e-03, grad_scale: 32.0 2024-09-24 22:11:14,753 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.301e+02 1.426e+02 1.535e+02 2.880e+02, threshold=2.852e+02, percent-clipped=1.0 2024-09-24 22:11:19,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=599587.3333333334, ans=0.125 2024-09-24 22:11:43,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=599634.0, ans=0.125 2024-09-24 22:11:46,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=599634.0, ans=0.125 2024-09-24 22:12:04,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=599680.6666666666, ans=0.1 2024-09-24 22:12:15,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2024-09-24 22:12:22,914 INFO [train.py:1198] (3/4) Epoch 33, batch 3850, loss[loss=0.227, ctc_loss=0.1509, cr_loss=0.3803, over 14902.00 frames. ], tot_loss[loss=0.2001, ctc_loss=0.1304, cr_loss=0.3487, over 3301705.33 frames. ], batch size: 89, lr: 3.58e-03, grad_scale: 16.0 2024-09-24 22:12:25,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.42 vs. limit=12.0 2024-09-24 22:12:34,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=599774.0, ans=0.1 2024-09-24 22:12:51,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=599820.6666666666, ans=0.05 2024-09-24 22:13:05,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=599867.3333333334, ans=0.125 2024-09-24 22:13:11,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.62 vs. limit=22.5 2024-09-24 22:13:21,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=599914.0, ans=0.125 2024-09-24 22:14:24,131 INFO [train.py:1198] (3/4) Epoch 34, batch 0, loss[loss=0.2132, ctc_loss=0.1419, cr_loss=0.3564, over 17232.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1419, cr_loss=0.3564, over 17232.00 frames. ], batch size: 55, lr: 3.53e-03, grad_scale: 32.0 2024-09-24 22:14:24,131 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 22:14:39,382 INFO [train.py:1230] (3/4) Epoch 34, validation: loss=0.03567, ctc_loss=0.03567, cr_loss=1.032e-14, over 944034.00 frames. 2024-09-24 22:14:39,383 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 22:14:58,491 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.377e+02 1.527e+02 1.748e+02 2.707e+02, threshold=3.055e+02, percent-clipped=0.0 2024-09-24 22:14:58,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=600035.3333333334, ans=0.0 2024-09-24 22:15:05,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=600035.3333333334, ans=0.0 2024-09-24 22:15:29,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=600128.6666666666, ans=0.125 2024-09-24 22:15:59,189 INFO [train.py:1198] (3/4) Epoch 34, batch 50, loss[loss=0.2053, ctc_loss=0.1321, cr_loss=0.3658, over 17342.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1262, cr_loss=0.3447, over 763003.30 frames. ], batch size: 48, lr: 3.53e-03, grad_scale: 32.0 2024-09-24 22:16:32,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=600315.3333333334, ans=0.09899494936611666 2024-09-24 22:16:32,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=22.5 2024-09-24 22:16:37,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=600315.3333333334, ans=0.04949747468305833 2024-09-24 22:16:53,374 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2024-09-24 22:17:02,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=600362.0, ans=15.0 2024-09-24 22:17:29,801 INFO [train.py:1198] (3/4) Epoch 34, batch 100, loss[loss=0.1719, ctc_loss=0.109, cr_loss=0.3144, over 17222.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.1276, cr_loss=0.3468, over 1333904.17 frames. ], batch size: 50, lr: 3.53e-03, grad_scale: 32.0 2024-09-24 22:17:44,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=600502.0, ans=0.125 2024-09-24 22:17:49,217 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.286e+02 1.376e+02 1.498e+02 2.035e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-24 22:17:59,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=600502.0, ans=0.125 2024-09-24 22:17:59,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=600502.0, ans=0.125 2024-09-24 22:18:07,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2024-09-24 22:18:24,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=600595.3333333334, ans=0.0 2024-09-24 22:18:38,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=15.0 2024-09-24 22:18:41,517 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-09-24 22:18:49,901 INFO [train.py:1198] (3/4) Epoch 34, batch 150, loss[loss=0.1582, ctc_loss=0.1006, cr_loss=0.2879, over 16329.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.125, cr_loss=0.3415, over 1785944.59 frames. ], batch size: 36, lr: 3.52e-03, grad_scale: 32.0 2024-09-24 22:19:03,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.52 vs. limit=15.0 2024-09-24 22:19:18,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.43 vs. limit=10.0 2024-09-24 22:19:32,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=600782.0, ans=0.0 2024-09-24 22:19:39,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=600828.6666666666, ans=0.125 2024-09-24 22:19:47,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=600828.6666666666, ans=0.0 2024-09-24 22:19:58,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.47 vs. limit=15.0 2024-09-24 22:20:06,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=12.0 2024-09-24 22:20:07,109 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.90 vs. limit=10.0 2024-09-24 22:20:09,194 INFO [train.py:1198] (3/4) Epoch 34, batch 200, loss[loss=0.251, ctc_loss=0.168, cr_loss=0.4151, over 16997.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1269, cr_loss=0.3445, over 2120355.69 frames. ], batch size: 53, lr: 3.52e-03, grad_scale: 32.0 2024-09-24 22:20:30,036 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.259e+02 1.333e+02 1.425e+02 2.058e+02, threshold=2.665e+02, percent-clipped=0.0 2024-09-24 22:20:41,978 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.23 vs. limit=15.0 2024-09-24 22:20:59,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=601062.0, ans=0.125 2024-09-24 22:21:00,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=601062.0, ans=0.125 2024-09-24 22:21:07,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=601062.0, ans=0.125 2024-09-24 22:21:16,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=601108.6666666666, ans=0.0 2024-09-24 22:21:18,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=601108.6666666666, ans=0.125 2024-09-24 22:21:26,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-09-24 22:21:32,287 INFO [train.py:1198] (3/4) Epoch 34, batch 250, loss[loss=0.2221, ctc_loss=0.1453, cr_loss=0.3839, over 15962.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1265, cr_loss=0.3434, over 2395131.52 frames. ], batch size: 74, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:21:32,611 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 22:22:30,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=601295.3333333334, ans=0.125 2024-09-24 22:22:37,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.64 vs. limit=15.0 2024-09-24 22:22:48,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=12.0 2024-09-24 22:23:00,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.38 vs. limit=15.0 2024-09-24 22:23:00,623 INFO [train.py:1198] (3/4) Epoch 34, batch 300, loss[loss=0.2143, ctc_loss=0.1411, cr_loss=0.3656, over 17022.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1262, cr_loss=0.3426, over 2613868.83 frames. ], batch size: 52, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:23:03,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=601388.6666666666, ans=0.1 2024-09-24 22:23:06,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-09-24 22:23:12,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=601388.6666666666, ans=15.0 2024-09-24 22:23:21,392 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.266e+02 1.387e+02 1.525e+02 2.483e+02, threshold=2.773e+02, percent-clipped=0.0 2024-09-24 22:23:25,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=15.0 2024-09-24 22:24:00,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=601528.6666666666, ans=0.125 2024-09-24 22:24:00,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2024-09-24 22:24:19,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=601622.0, ans=0.125 2024-09-24 22:24:20,384 INFO [train.py:1198] (3/4) Epoch 34, batch 350, loss[loss=0.1836, ctc_loss=0.1177, cr_loss=0.3296, over 17032.00 frames. ], tot_loss[loss=0.1937, ctc_loss=0.1255, cr_loss=0.3412, over 2779969.27 frames. ], batch size: 44, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:24:36,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=601668.6666666666, ans=0.125 2024-09-24 22:24:38,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=601668.6666666666, ans=0.1 2024-09-24 22:24:52,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=601715.3333333334, ans=0.1 2024-09-24 22:24:54,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=601715.3333333334, ans=0.0 2024-09-24 22:25:02,506 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=12.0 2024-09-24 22:25:18,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=601762.0, ans=0.0 2024-09-24 22:25:32,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=601808.6666666666, ans=0.125 2024-09-24 22:25:39,909 INFO [train.py:1198] (3/4) Epoch 34, batch 400, loss[loss=0.2275, ctc_loss=0.1545, cr_loss=0.365, over 14955.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1266, cr_loss=0.3422, over 2898479.74 frames. ], batch size: 89, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:25:47,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.55 vs. limit=10.0 2024-09-24 22:25:53,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=601855.3333333334, ans=0.1 2024-09-24 22:25:56,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=601902.0, ans=0.125 2024-09-24 22:26:02,217 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.275e+02 1.410e+02 1.501e+02 2.362e+02, threshold=2.821e+02, percent-clipped=0.0 2024-09-24 22:26:06,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.53 vs. limit=10.0 2024-09-24 22:26:26,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-09-24 22:26:42,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=601995.3333333334, ans=0.0 2024-09-24 22:26:50,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=22.5 2024-09-24 22:26:52,108 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 22:27:05,427 INFO [train.py:1198] (3/4) Epoch 34, batch 450, loss[loss=0.2479, ctc_loss=0.1645, cr_loss=0.4174, over 17022.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.1261, cr_loss=0.3415, over 3003683.97 frames. ], batch size: 52, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:27:05,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=602088.6666666666, ans=0.0 2024-09-24 22:27:46,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=602182.0, ans=15.0 2024-09-24 22:27:46,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-09-24 22:27:59,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2024-09-24 22:28:00,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=602228.6666666666, ans=0.0 2024-09-24 22:28:01,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=602228.6666666666, ans=0.125 2024-09-24 22:28:26,489 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=22.5 2024-09-24 22:28:30,700 INFO [train.py:1198] (3/4) Epoch 34, batch 500, loss[loss=0.1895, ctc_loss=0.1243, cr_loss=0.326, over 17307.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1267, cr_loss=0.3421, over 3076637.92 frames. ], batch size: 46, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:28:43,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=602322.0, ans=0.0 2024-09-24 22:28:43,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=602322.0, ans=0.07 2024-09-24 22:28:52,904 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.302e+02 1.372e+02 1.474e+02 2.066e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-24 22:29:13,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=602415.3333333334, ans=0.2 2024-09-24 22:29:34,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=602508.6666666666, ans=0.125 2024-09-24 22:29:35,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.02 vs. limit=22.5 2024-09-24 22:29:37,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=602508.6666666666, ans=0.025 2024-09-24 22:29:40,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=602508.6666666666, ans=0.2 2024-09-24 22:29:49,896 INFO [train.py:1198] (3/4) Epoch 34, batch 550, loss[loss=0.1881, ctc_loss=0.1212, cr_loss=0.3345, over 17144.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1273, cr_loss=0.3432, over 3142243.00 frames. ], batch size: 48, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:30:06,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=602602.0, ans=0.2 2024-09-24 22:30:39,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=602695.3333333334, ans=0.125 2024-09-24 22:31:09,784 INFO [train.py:1198] (3/4) Epoch 34, batch 600, loss[loss=0.1954, ctc_loss=0.1281, cr_loss=0.3364, over 17320.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1262, cr_loss=0.3422, over 3193480.70 frames. ], batch size: 51, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:31:11,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602788.6666666666, ans=0.1 2024-09-24 22:31:34,416 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.259e+02 1.339e+02 1.429e+02 3.195e+02, threshold=2.679e+02, percent-clipped=1.0 2024-09-24 22:32:30,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=602975.3333333334, ans=0.2 2024-09-24 22:32:39,992 INFO [train.py:1198] (3/4) Epoch 34, batch 650, loss[loss=0.151, ctc_loss=0.09611, cr_loss=0.2746, over 17113.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.127, cr_loss=0.3433, over 3228937.49 frames. ], batch size: 40, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:32:51,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=603022.0, ans=0.0 2024-09-24 22:33:01,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=22.5 2024-09-24 22:33:07,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=603068.6666666666, ans=0.0 2024-09-24 22:33:15,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=603115.3333333334, ans=0.0 2024-09-24 22:33:35,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=603162.0, ans=0.125 2024-09-24 22:34:00,580 INFO [train.py:1198] (3/4) Epoch 34, batch 700, loss[loss=0.2022, ctc_loss=0.1308, cr_loss=0.3569, over 17025.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.1271, cr_loss=0.3448, over 3255335.17 frames. ], batch size: 51, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:34:13,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=603255.3333333334, ans=0.125 2024-09-24 22:34:23,178 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.279e+02 1.394e+02 1.540e+02 2.036e+02, threshold=2.788e+02, percent-clipped=0.0 2024-09-24 22:35:21,221 INFO [train.py:1198] (3/4) Epoch 34, batch 750, loss[loss=0.1514, ctc_loss=0.09705, cr_loss=0.2716, over 17187.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.1269, cr_loss=0.344, over 3277533.10 frames. ], batch size: 41, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:35:21,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=603488.6666666666, ans=0.125 2024-09-24 22:35:21,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2024-09-24 22:35:33,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2024-09-24 22:35:36,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=603535.3333333334, ans=0.125 2024-09-24 22:35:39,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=603535.3333333334, ans=0.2 2024-09-24 22:35:40,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=603535.3333333334, ans=0.125 2024-09-24 22:35:55,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=603582.0, ans=15.0 2024-09-24 22:36:12,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=15.0 2024-09-24 22:36:17,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=603628.6666666666, ans=0.125 2024-09-24 22:36:20,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=603628.6666666666, ans=0.125 2024-09-24 22:36:43,573 INFO [train.py:1198] (3/4) Epoch 34, batch 800, loss[loss=0.2296, ctc_loss=0.1507, cr_loss=0.3944, over 15940.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1264, cr_loss=0.3428, over 3301029.81 frames. ], batch size: 74, lr: 3.52e-03, grad_scale: 32.0 2024-09-24 22:37:08,641 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.269e+02 1.369e+02 1.482e+02 1.915e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-24 22:37:14,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=603768.6666666666, ans=0.0 2024-09-24 22:37:22,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=603815.3333333334, ans=0.2 2024-09-24 22:37:27,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.26 vs. limit=22.5 2024-09-24 22:37:36,565 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 22:38:12,079 INFO [train.py:1198] (3/4) Epoch 34, batch 850, loss[loss=0.1893, ctc_loss=0.1205, cr_loss=0.3443, over 17161.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.127, cr_loss=0.3435, over 3307475.67 frames. ], batch size: 45, lr: 3.52e-03, grad_scale: 32.0 2024-09-24 22:38:13,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=603955.3333333334, ans=0.0 2024-09-24 22:38:14,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.04 vs. limit=10.0 2024-09-24 22:38:20,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=603955.3333333334, ans=0.0 2024-09-24 22:39:12,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=604095.3333333334, ans=0.025 2024-09-24 22:39:31,804 INFO [train.py:1198] (3/4) Epoch 34, batch 900, loss[loss=0.2065, ctc_loss=0.1343, cr_loss=0.3608, over 17295.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1264, cr_loss=0.3431, over 3329678.39 frames. ], batch size: 49, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:39:49,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=604235.3333333334, ans=10.0 2024-09-24 22:39:54,076 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.293e+02 1.407e+02 1.509e+02 3.747e+02, threshold=2.814e+02, percent-clipped=1.0 2024-09-24 22:40:27,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=604328.6666666666, ans=0.025 2024-09-24 22:40:30,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=604328.6666666666, ans=0.1 2024-09-24 22:40:44,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.86 vs. limit=15.0 2024-09-24 22:40:47,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.20 vs. limit=22.5 2024-09-24 22:40:52,442 INFO [train.py:1198] (3/4) Epoch 34, batch 950, loss[loss=0.2099, ctc_loss=0.1401, cr_loss=0.3492, over 16755.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1261, cr_loss=0.3428, over 3340562.35 frames. ], batch size: 61, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:41:05,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=604422.0, ans=0.125 2024-09-24 22:41:20,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=604468.6666666666, ans=0.0 2024-09-24 22:41:42,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=604562.0, ans=0.0 2024-09-24 22:42:23,031 INFO [train.py:1198] (3/4) Epoch 34, batch 1000, loss[loss=0.1649, ctc_loss=0.1029, cr_loss=0.3101, over 16709.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1267, cr_loss=0.3435, over 3334413.50 frames. ], batch size: 37, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:42:29,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=604655.3333333334, ans=0.2 2024-09-24 22:42:42,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=604702.0, ans=0.0 2024-09-24 22:42:45,412 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.270e+02 1.349e+02 1.472e+02 1.904e+02, threshold=2.698e+02, percent-clipped=0.0 2024-09-24 22:43:10,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.99 vs. limit=15.0 2024-09-24 22:43:35,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=604842.0, ans=0.0 2024-09-24 22:43:35,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=604842.0, ans=0.1 2024-09-24 22:43:42,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=604888.6666666666, ans=0.125 2024-09-24 22:43:43,611 INFO [train.py:1198] (3/4) Epoch 34, batch 1050, loss[loss=0.2075, ctc_loss=0.1361, cr_loss=0.3574, over 17052.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1266, cr_loss=0.3437, over 3331199.78 frames. ], batch size: 52, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:43:45,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=604888.6666666666, ans=0.125 2024-09-24 22:44:00,054 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-09-24 22:44:10,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=604935.3333333334, ans=10.0 2024-09-24 22:44:38,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=605028.6666666666, ans=0.0 2024-09-24 22:44:40,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=605028.6666666666, ans=6.0 2024-09-24 22:44:41,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=605028.6666666666, ans=0.0 2024-09-24 22:44:45,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.49 vs. limit=15.0 2024-09-24 22:45:03,652 INFO [train.py:1198] (3/4) Epoch 34, batch 1100, loss[loss=0.2069, ctc_loss=0.1331, cr_loss=0.3688, over 16996.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.127, cr_loss=0.3453, over 3333392.81 frames. ], batch size: 53, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:45:10,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=605122.0, ans=0.0 2024-09-24 22:45:26,104 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.266e+02 1.354e+02 1.438e+02 1.919e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-24 22:45:30,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.00 vs. limit=6.0 2024-09-24 22:45:40,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=605215.3333333334, ans=0.125 2024-09-24 22:45:46,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2024-09-24 22:45:58,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=605262.0, ans=0.04949747468305833 2024-09-24 22:46:19,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=605308.6666666666, ans=0.125 2024-09-24 22:46:22,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=605355.3333333334, ans=0.125 2024-09-24 22:46:26,914 INFO [train.py:1198] (3/4) Epoch 34, batch 1150, loss[loss=0.1971, ctc_loss=0.1281, cr_loss=0.3447, over 17150.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1269, cr_loss=0.345, over 3333653.43 frames. ], batch size: 48, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:47:01,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=12.0 2024-09-24 22:47:06,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=605448.6666666666, ans=0.125 2024-09-24 22:47:48,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=12.0 2024-09-24 22:47:54,034 INFO [train.py:1198] (3/4) Epoch 34, batch 1200, loss[loss=0.1876, ctc_loss=0.1211, cr_loss=0.3323, over 17185.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1265, cr_loss=0.3441, over 3344117.30 frames. ], batch size: 41, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:48:00,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=605588.6666666666, ans=0.125 2024-09-24 22:48:12,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.93 vs. limit=22.5 2024-09-24 22:48:16,699 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.285e+02 1.360e+02 1.428e+02 2.074e+02, threshold=2.720e+02, percent-clipped=0.0 2024-09-24 22:48:31,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=605682.0, ans=0.0 2024-09-24 22:48:59,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=605775.3333333334, ans=0.0 2024-09-24 22:49:14,483 INFO [train.py:1198] (3/4) Epoch 34, batch 1250, loss[loss=0.1991, ctc_loss=0.1305, cr_loss=0.3431, over 16802.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.126, cr_loss=0.3425, over 3350942.99 frames. ], batch size: 61, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:49:26,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=605822.0, ans=0.0 2024-09-24 22:49:29,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=605868.6666666666, ans=0.125 2024-09-24 22:49:37,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=605868.6666666666, ans=0.125 2024-09-24 22:49:43,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=605868.6666666666, ans=0.125 2024-09-24 22:49:53,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=605915.3333333334, ans=0.0 2024-09-24 22:50:18,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=606008.6666666666, ans=0.1 2024-09-24 22:50:23,837 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 22:50:34,628 INFO [train.py:1198] (3/4) Epoch 34, batch 1300, loss[loss=0.1679, ctc_loss=0.109, cr_loss=0.2946, over 17284.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1256, cr_loss=0.3421, over 3359163.14 frames. ], batch size: 42, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:50:36,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=606055.3333333334, ans=0.0 2024-09-24 22:50:47,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=606055.3333333334, ans=0.125 2024-09-24 22:50:57,102 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.261e+02 1.315e+02 1.424e+02 2.011e+02, threshold=2.630e+02, percent-clipped=0.0 2024-09-24 22:51:16,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=606148.6666666666, ans=0.0 2024-09-24 22:51:16,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=606148.6666666666, ans=0.0 2024-09-24 22:51:59,709 INFO [train.py:1198] (3/4) Epoch 34, batch 1350, loss[loss=0.2456, ctc_loss=0.1611, cr_loss=0.4226, over 17353.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1268, cr_loss=0.3438, over 3352924.41 frames. ], batch size: 48, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:52:06,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=606288.6666666666, ans=0.0 2024-09-24 22:52:26,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.24 vs. limit=15.0 2024-09-24 22:52:41,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=606382.0, ans=0.125 2024-09-24 22:52:43,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=606382.0, ans=0.125 2024-09-24 22:52:48,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=606382.0, ans=0.0 2024-09-24 22:52:51,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=606428.6666666666, ans=0.125 2024-09-24 22:53:21,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=606475.3333333334, ans=0.025 2024-09-24 22:53:24,835 INFO [train.py:1198] (3/4) Epoch 34, batch 1400, loss[loss=0.2183, ctc_loss=0.1401, cr_loss=0.3908, over 16998.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1279, cr_loss=0.3452, over 3350867.53 frames. ], batch size: 53, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:53:44,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=606568.6666666666, ans=0.2 2024-09-24 22:53:46,058 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 22:53:47,247 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.283e+02 1.396e+02 1.529e+02 1.918e+02, threshold=2.793e+02, percent-clipped=0.0 2024-09-24 22:53:47,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=606568.6666666666, ans=0.125 2024-09-24 22:53:47,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=606568.6666666666, ans=0.2 2024-09-24 22:54:01,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=606615.3333333334, ans=0.125 2024-09-24 22:54:08,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=606615.3333333334, ans=0.0 2024-09-24 22:54:09,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=606615.3333333334, ans=0.0 2024-09-24 22:54:18,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=606662.0, ans=10.0 2024-09-24 22:54:25,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=606662.0, ans=0.125 2024-09-24 22:54:30,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=606708.6666666666, ans=0.1 2024-09-24 22:54:44,664 INFO [train.py:1198] (3/4) Epoch 34, batch 1450, loss[loss=0.1965, ctc_loss=0.1255, cr_loss=0.3548, over 15996.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1276, cr_loss=0.3452, over 3349655.57 frames. ], batch size: 74, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:54:50,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2024-09-24 22:55:18,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=606848.6666666666, ans=0.025 2024-09-24 22:55:42,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=606895.3333333334, ans=0.125 2024-09-24 22:55:58,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=606942.0, ans=0.125 2024-09-24 22:56:04,599 INFO [train.py:1198] (3/4) Epoch 34, batch 1500, loss[loss=0.2205, ctc_loss=0.1453, cr_loss=0.3758, over 17211.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1273, cr_loss=0.345, over 3354993.50 frames. ], batch size: 55, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:56:25,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=607035.3333333334, ans=0.95 2024-09-24 22:56:29,451 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.258e+02 1.350e+02 1.442e+02 1.821e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-24 22:56:53,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=607082.0, ans=0.125 2024-09-24 22:56:57,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=607128.6666666666, ans=0.1 2024-09-24 22:57:10,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=607128.6666666666, ans=0.1 2024-09-24 22:57:12,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=607128.6666666666, ans=0.2 2024-09-24 22:57:34,687 INFO [train.py:1198] (3/4) Epoch 34, batch 1550, loss[loss=0.2056, ctc_loss=0.1358, cr_loss=0.349, over 15909.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.128, cr_loss=0.3463, over 3355611.38 frames. ], batch size: 74, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:57:36,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=607222.0, ans=0.125 2024-09-24 22:57:38,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=607222.0, ans=0.2 2024-09-24 22:57:52,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=607268.6666666666, ans=0.125 2024-09-24 22:58:03,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=607268.6666666666, ans=0.125 2024-09-24 22:58:17,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=607315.3333333334, ans=0.125 2024-09-24 22:58:17,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=607315.3333333334, ans=0.0 2024-09-24 22:58:54,426 INFO [train.py:1198] (3/4) Epoch 34, batch 1600, loss[loss=0.1815, ctc_loss=0.118, cr_loss=0.3173, over 17203.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.128, cr_loss=0.3464, over 3362620.74 frames. ], batch size: 47, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:58:56,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=12.0 2024-09-24 22:59:02,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=607455.3333333334, ans=0.0 2024-09-24 22:59:16,752 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.263e+02 1.329e+02 1.418e+02 2.097e+02, threshold=2.657e+02, percent-clipped=0.0 2024-09-24 22:59:37,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=607548.6666666666, ans=15.0 2024-09-24 22:59:39,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=607548.6666666666, ans=0.125 2024-09-24 23:00:14,910 INFO [train.py:1198] (3/4) Epoch 34, batch 1650, loss[loss=0.2229, ctc_loss=0.1526, cr_loss=0.3513, over 11408.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1276, cr_loss=0.3458, over 3365624.10 frames. ], batch size: 123, lr: 3.50e-03, grad_scale: 16.0 2024-09-24 23:00:29,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=607735.3333333334, ans=0.0 2024-09-24 23:00:51,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.62 vs. limit=15.0 2024-09-24 23:00:59,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=607782.0, ans=0.0 2024-09-24 23:01:12,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=607828.6666666666, ans=0.125 2024-09-24 23:01:32,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=607875.3333333334, ans=0.125 2024-09-24 23:01:36,804 INFO [train.py:1198] (3/4) Epoch 34, batch 1700, loss[loss=0.1856, ctc_loss=0.1204, cr_loss=0.3264, over 16975.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1273, cr_loss=0.3452, over 3371387.67 frames. ], batch size: 42, lr: 3.50e-03, grad_scale: 16.0 2024-09-24 23:01:37,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=607922.0, ans=0.125 2024-09-24 23:02:05,582 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.278e+02 1.343e+02 1.428e+02 2.095e+02, threshold=2.685e+02, percent-clipped=0.0 2024-09-24 23:02:06,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=607968.6666666666, ans=0.125 2024-09-24 23:02:13,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=607968.6666666666, ans=0.0 2024-09-24 23:02:27,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=608015.3333333334, ans=0.2 2024-09-24 23:02:43,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=608062.0, ans=0.125 2024-09-24 23:03:04,328 INFO [train.py:1198] (3/4) Epoch 34, batch 1750, loss[loss=0.1934, ctc_loss=0.1261, cr_loss=0.3365, over 17158.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.126, cr_loss=0.343, over 3376538.10 frames. ], batch size: 45, lr: 3.50e-03, grad_scale: 16.0 2024-09-24 23:03:07,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=608155.3333333334, ans=0.1 2024-09-24 23:03:22,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=608202.0, ans=0.5 2024-09-24 23:03:36,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=608248.6666666666, ans=0.0 2024-09-24 23:03:47,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=608248.6666666666, ans=0.0 2024-09-24 23:04:00,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=608295.3333333334, ans=0.125 2024-09-24 23:04:05,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=608295.3333333334, ans=0.125 2024-09-24 23:04:24,157 INFO [train.py:1198] (3/4) Epoch 34, batch 1800, loss[loss=0.1881, ctc_loss=0.1205, cr_loss=0.3382, over 17267.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.1259, cr_loss=0.3427, over 3371339.30 frames. ], batch size: 42, lr: 3.50e-03, grad_scale: 16.0 2024-09-24 23:04:48,242 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.244e+02 1.332e+02 1.441e+02 2.602e+02, threshold=2.663e+02, percent-clipped=0.0 2024-09-24 23:04:50,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=608435.3333333334, ans=0.125 2024-09-24 23:05:06,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=22.5 2024-09-24 23:05:33,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=608575.3333333334, ans=0.125 2024-09-24 23:05:41,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=608575.3333333334, ans=0.1 2024-09-24 23:05:44,352 INFO [train.py:1198] (3/4) Epoch 34, batch 1850, loss[loss=0.1807, ctc_loss=0.1186, cr_loss=0.3106, over 17255.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1261, cr_loss=0.3425, over 3360829.24 frames. ], batch size: 44, lr: 3.50e-03, grad_scale: 16.0 2024-09-24 23:05:49,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=22.5 2024-09-24 23:06:08,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=608668.6666666666, ans=0.125 2024-09-24 23:06:08,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=608668.6666666666, ans=0.1 2024-09-24 23:06:17,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=15.0 2024-09-24 23:06:21,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=608715.3333333334, ans=0.0 2024-09-24 23:07:15,203 INFO [train.py:1198] (3/4) Epoch 34, batch 1900, loss[loss=0.1945, ctc_loss=0.1248, cr_loss=0.3487, over 17028.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.1259, cr_loss=0.3427, over 3369559.08 frames. ], batch size: 56, lr: 3.50e-03, grad_scale: 16.0 2024-09-24 23:07:27,015 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 23:07:30,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=608902.0, ans=0.125 2024-09-24 23:07:38,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2024-09-24 23:07:39,345 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.279e+02 1.348e+02 1.451e+02 1.794e+02, threshold=2.695e+02, percent-clipped=0.0 2024-09-24 23:07:58,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=608948.6666666666, ans=0.2 2024-09-24 23:08:29,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=609042.0, ans=0.125 2024-09-24 23:08:31,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.77 vs. limit=10.0 2024-09-24 23:08:35,258 INFO [train.py:1198] (3/4) Epoch 34, batch 1950, loss[loss=0.2279, ctc_loss=0.1543, cr_loss=0.368, over 11764.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1261, cr_loss=0.3429, over 3363721.02 frames. ], batch size: 123, lr: 3.50e-03, grad_scale: 16.0 2024-09-24 23:09:03,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=609135.3333333334, ans=0.1 2024-09-24 23:09:11,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=609182.0, ans=0.125 2024-09-24 23:09:19,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=609182.0, ans=0.125 2024-09-24 23:09:34,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.42 vs. limit=22.5 2024-09-24 23:09:42,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=609275.3333333334, ans=0.125 2024-09-24 23:09:51,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=609275.3333333334, ans=0.05 2024-09-24 23:09:56,093 INFO [train.py:1198] (3/4) Epoch 34, batch 2000, loss[loss=0.1813, ctc_loss=0.1171, cr_loss=0.3212, over 17205.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1263, cr_loss=0.3429, over 3360872.65 frames. ], batch size: 47, lr: 3.50e-03, grad_scale: 32.0 2024-09-24 23:10:09,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=609322.0, ans=15.0 2024-09-24 23:10:10,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=609368.6666666666, ans=0.125 2024-09-24 23:10:19,956 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.293e+02 1.367e+02 1.523e+02 2.152e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-24 23:10:59,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=609508.6666666666, ans=0.125 2024-09-24 23:11:05,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=609508.6666666666, ans=0.04949747468305833 2024-09-24 23:11:18,785 INFO [train.py:1198] (3/4) Epoch 34, batch 2050, loss[loss=0.2153, ctc_loss=0.1411, cr_loss=0.371, over 17035.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1265, cr_loss=0.343, over 3364993.71 frames. ], batch size: 52, lr: 3.50e-03, grad_scale: 32.0 2024-09-24 23:11:22,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=609555.3333333334, ans=0.1 2024-09-24 23:11:27,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2024-09-24 23:11:43,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-09-24 23:11:52,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2024-09-24 23:12:34,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=609742.0, ans=0.0 2024-09-24 23:12:35,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.39 vs. limit=15.0 2024-09-24 23:12:45,762 INFO [train.py:1198] (3/4) Epoch 34, batch 2100, loss[loss=0.1914, ctc_loss=0.125, cr_loss=0.332, over 16928.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1265, cr_loss=0.3427, over 3370419.19 frames. ], batch size: 58, lr: 3.50e-03, grad_scale: 32.0 2024-09-24 23:12:47,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=609788.6666666666, ans=0.125 2024-09-24 23:12:50,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=609788.6666666666, ans=0.0 2024-09-24 23:12:58,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=609788.6666666666, ans=0.0 2024-09-24 23:13:00,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=609835.3333333334, ans=0.1 2024-09-24 23:13:10,054 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.285e+02 1.366e+02 1.478e+02 2.142e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-24 23:13:31,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=609882.0, ans=0.0 2024-09-24 23:13:32,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=609928.6666666666, ans=0.0 2024-09-24 23:13:37,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=609928.6666666666, ans=10.0 2024-09-24 23:13:45,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=609928.6666666666, ans=0.125 2024-09-24 23:13:48,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2024-09-24 23:13:57,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=609975.3333333334, ans=0.0 2024-09-24 23:14:00,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=609975.3333333334, ans=0.125 2024-09-24 23:14:06,287 INFO [train.py:1198] (3/4) Epoch 34, batch 2150, loss[loss=0.1757, ctc_loss=0.1113, cr_loss=0.3219, over 17045.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1263, cr_loss=0.3424, over 3367479.59 frames. ], batch size: 39, lr: 3.50e-03, grad_scale: 32.0 2024-09-24 23:14:06,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=610022.0, ans=0.07 2024-09-24 23:14:08,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=610022.0, ans=0.0 2024-09-24 23:14:20,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=610068.6666666666, ans=0.2 2024-09-24 23:15:04,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=22.5 2024-09-24 23:15:08,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=610208.6666666666, ans=0.1 2024-09-24 23:15:17,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=610208.6666666666, ans=0.125 2024-09-24 23:15:20,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=610208.6666666666, ans=0.0 2024-09-24 23:15:26,284 INFO [train.py:1198] (3/4) Epoch 34, batch 2200, loss[loss=0.1897, ctc_loss=0.1236, cr_loss=0.3303, over 17104.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1263, cr_loss=0.3427, over 3359313.19 frames. ], batch size: 49, lr: 3.50e-03, grad_scale: 32.0 2024-09-24 23:15:50,348 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.250e+02 1.358e+02 1.486e+02 2.433e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-24 23:16:03,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=610348.6666666666, ans=0.125 2024-09-24 23:16:20,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=610395.3333333334, ans=0.0 2024-09-24 23:16:28,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=610395.3333333334, ans=0.125 2024-09-24 23:16:51,486 INFO [train.py:1198] (3/4) Epoch 34, batch 2250, loss[loss=0.1971, ctc_loss=0.1291, cr_loss=0.3399, over 17015.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1264, cr_loss=0.3428, over 3357637.12 frames. ], batch size: 51, lr: 3.50e-03, grad_scale: 32.0 2024-09-24 23:17:23,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=610535.3333333334, ans=0.125 2024-09-24 23:17:33,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2024-09-24 23:18:06,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=610675.3333333334, ans=0.125 2024-09-24 23:18:14,099 INFO [train.py:1198] (3/4) Epoch 34, batch 2300, loss[loss=0.2242, ctc_loss=0.1489, cr_loss=0.3765, over 17203.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1262, cr_loss=0.3425, over 3356360.04 frames. ], batch size: 55, lr: 3.50e-03, grad_scale: 32.0 2024-09-24 23:18:14,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=610722.0, ans=0.1 2024-09-24 23:18:24,130 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 23:18:25,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=610722.0, ans=0.1 2024-09-24 23:18:32,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2024-09-24 23:18:38,051 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.297e+02 1.390e+02 1.515e+02 2.091e+02, threshold=2.780e+02, percent-clipped=0.0 2024-09-24 23:18:43,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=610768.6666666666, ans=0.025 2024-09-24 23:18:47,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=610815.3333333334, ans=0.0 2024-09-24 23:18:49,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=610815.3333333334, ans=10.0 2024-09-24 23:18:49,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=610815.3333333334, ans=15.0 2024-09-24 23:19:07,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=610862.0, ans=0.0 2024-09-24 23:19:08,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=610862.0, ans=0.125 2024-09-24 23:19:22,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.84 vs. limit=10.0 2024-09-24 23:19:25,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2024-09-24 23:19:34,059 INFO [train.py:1198] (3/4) Epoch 34, batch 2350, loss[loss=0.1791, ctc_loss=0.1164, cr_loss=0.3132, over 17176.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1256, cr_loss=0.3415, over 3359735.64 frames. ], batch size: 41, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:19:40,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=610955.3333333334, ans=0.125 2024-09-24 23:19:50,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=611002.0, ans=0.0 2024-09-24 23:19:57,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=611002.0, ans=0.2 2024-09-24 23:20:02,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=611002.0, ans=0.04949747468305833 2024-09-24 23:20:40,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=611142.0, ans=0.125 2024-09-24 23:20:53,294 INFO [train.py:1198] (3/4) Epoch 34, batch 2400, loss[loss=0.1865, ctc_loss=0.1188, cr_loss=0.3385, over 17315.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1251, cr_loss=0.341, over 3366862.42 frames. ], batch size: 49, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:21:17,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=611235.3333333334, ans=0.2 2024-09-24 23:21:19,782 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.247e+02 1.313e+02 1.426e+02 1.860e+02, threshold=2.625e+02, percent-clipped=0.0 2024-09-24 23:21:37,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=611282.0, ans=0.0 2024-09-24 23:21:41,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=611282.0, ans=0.2 2024-09-24 23:21:45,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=611328.6666666666, ans=0.125 2024-09-24 23:21:46,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=611328.6666666666, ans=0.2 2024-09-24 23:22:05,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=611328.6666666666, ans=0.125 2024-09-24 23:22:05,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=22.5 2024-09-24 23:22:11,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=611375.3333333334, ans=0.2 2024-09-24 23:22:14,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=611375.3333333334, ans=0.0 2024-09-24 23:22:14,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=611375.3333333334, ans=0.125 2024-09-24 23:22:23,851 INFO [train.py:1198] (3/4) Epoch 34, batch 2450, loss[loss=0.1982, ctc_loss=0.1294, cr_loss=0.3439, over 17220.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1261, cr_loss=0.343, over 3357922.83 frames. ], batch size: 50, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:22:33,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=611422.0, ans=0.0 2024-09-24 23:22:44,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=611468.6666666666, ans=0.125 2024-09-24 23:22:50,048 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-09-24 23:23:12,341 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 23:23:21,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=611562.0, ans=0.2 2024-09-24 23:23:28,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611608.6666666666, ans=0.1 2024-09-24 23:23:41,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=611608.6666666666, ans=0.1 2024-09-24 23:23:42,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=611655.3333333334, ans=0.125 2024-09-24 23:23:43,948 INFO [train.py:1198] (3/4) Epoch 34, batch 2500, loss[loss=0.1745, ctc_loss=0.1122, cr_loss=0.3114, over 17100.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.1271, cr_loss=0.3448, over 3360160.61 frames. ], batch size: 40, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:23:45,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=611655.3333333334, ans=0.125 2024-09-24 23:23:48,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-09-24 23:23:52,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=611655.3333333334, ans=0.035 2024-09-24 23:23:57,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=611655.3333333334, ans=0.125 2024-09-24 23:24:08,084 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.282e+02 1.373e+02 1.484e+02 1.977e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-24 23:24:13,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=611702.0, ans=0.125 2024-09-24 23:24:17,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2024-09-24 23:24:24,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=611748.6666666666, ans=0.125 2024-09-24 23:25:04,066 INFO [train.py:1198] (3/4) Epoch 34, batch 2550, loss[loss=0.1985, ctc_loss=0.1305, cr_loss=0.3402, over 17007.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1269, cr_loss=0.3445, over 3369421.28 frames. ], batch size: 56, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:25:04,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611888.6666666666, ans=0.1 2024-09-24 23:25:04,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=611888.6666666666, ans=0.125 2024-09-24 23:25:12,407 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 23:25:16,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611888.6666666666, ans=0.1 2024-09-24 23:25:17,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=611888.6666666666, ans=0.125 2024-09-24 23:25:31,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=611935.3333333334, ans=0.125 2024-09-24 23:25:40,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=611982.0, ans=0.125 2024-09-24 23:25:58,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=612028.6666666666, ans=0.0 2024-09-24 23:26:01,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=612028.6666666666, ans=0.0 2024-09-24 23:26:13,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=612075.3333333334, ans=0.125 2024-09-24 23:26:15,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=612075.3333333334, ans=0.125 2024-09-24 23:26:20,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=612075.3333333334, ans=0.125 2024-09-24 23:26:29,170 INFO [train.py:1198] (3/4) Epoch 34, batch 2600, loss[loss=0.2264, ctc_loss=0.1477, cr_loss=0.3936, over 17241.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1264, cr_loss=0.3441, over 3372263.33 frames. ], batch size: 55, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:26:43,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=612168.6666666666, ans=0.07 2024-09-24 23:26:45,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=612168.6666666666, ans=0.125 2024-09-24 23:26:45,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=612168.6666666666, ans=0.125 2024-09-24 23:26:55,519 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.298e+02 1.376e+02 1.484e+02 2.452e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-24 23:27:13,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=612215.3333333334, ans=0.125 2024-09-24 23:27:51,789 INFO [train.py:1198] (3/4) Epoch 34, batch 2650, loss[loss=0.2149, ctc_loss=0.1452, cr_loss=0.3484, over 11723.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1263, cr_loss=0.3437, over 3362922.99 frames. ], batch size: 124, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:28:03,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=612355.3333333334, ans=0.025 2024-09-24 23:28:12,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=612402.0, ans=0.1 2024-09-24 23:28:14,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2024-09-24 23:28:22,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=612448.6666666666, ans=0.125 2024-09-24 23:28:38,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=612495.3333333334, ans=0.1 2024-09-24 23:28:43,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=612495.3333333334, ans=0.5 2024-09-24 23:28:48,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=612495.3333333334, ans=0.0 2024-09-24 23:28:59,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=612542.0, ans=0.1 2024-09-24 23:29:10,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=612588.6666666666, ans=0.1 2024-09-24 23:29:12,131 INFO [train.py:1198] (3/4) Epoch 34, batch 2700, loss[loss=0.2292, ctc_loss=0.159, cr_loss=0.3511, over 12693.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1273, cr_loss=0.3455, over 3345602.82 frames. ], batch size: 124, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:29:36,058 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.298e+02 1.372e+02 1.486e+02 2.496e+02, threshold=2.744e+02, percent-clipped=0.0 2024-09-24 23:30:05,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=612728.6666666666, ans=0.125 2024-09-24 23:30:08,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=612728.6666666666, ans=0.2 2024-09-24 23:30:31,914 INFO [train.py:1198] (3/4) Epoch 34, batch 2750, loss[loss=0.1636, ctc_loss=0.1025, cr_loss=0.3057, over 17091.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1273, cr_loss=0.3457, over 3349559.97 frames. ], batch size: 43, lr: 3.49e-03, grad_scale: 16.0 2024-09-24 23:30:38,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=612822.0, ans=0.1 2024-09-24 23:31:14,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=612915.3333333334, ans=0.0 2024-09-24 23:31:25,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=612962.0, ans=0.125 2024-09-24 23:31:25,761 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=22.5 2024-09-24 23:31:43,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=613008.6666666666, ans=0.125 2024-09-24 23:32:02,050 INFO [train.py:1198] (3/4) Epoch 34, batch 2800, loss[loss=0.1817, ctc_loss=0.1165, cr_loss=0.3261, over 17273.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1274, cr_loss=0.3453, over 3355597.08 frames. ], batch size: 42, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:32:12,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=613055.3333333334, ans=0.2 2024-09-24 23:32:16,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=613102.0, ans=0.04949747468305833 2024-09-24 23:32:27,770 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.048e+02 1.265e+02 1.343e+02 1.457e+02 2.911e+02, threshold=2.687e+02, percent-clipped=1.0 2024-09-24 23:32:55,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=613195.3333333334, ans=0.04949747468305833 2024-09-24 23:33:00,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=613195.3333333334, ans=0.125 2024-09-24 23:33:03,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=613195.3333333334, ans=0.125 2024-09-24 23:33:17,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=613242.0, ans=0.0 2024-09-24 23:33:22,281 INFO [train.py:1198] (3/4) Epoch 34, batch 2850, loss[loss=0.2204, ctc_loss=0.141, cr_loss=0.3967, over 17168.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1276, cr_loss=0.3463, over 3355010.75 frames. ], batch size: 45, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:33:54,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=613382.0, ans=0.125 2024-09-24 23:33:54,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=613382.0, ans=0.125 2024-09-24 23:33:56,572 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 23:34:18,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=613428.6666666666, ans=0.2 2024-09-24 23:34:35,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=613475.3333333334, ans=0.015 2024-09-24 23:34:37,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=613475.3333333334, ans=0.1 2024-09-24 23:34:42,222 INFO [train.py:1198] (3/4) Epoch 34, batch 2900, loss[loss=0.2004, ctc_loss=0.129, cr_loss=0.3573, over 17352.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1283, cr_loss=0.3466, over 3351125.51 frames. ], batch size: 48, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:35:01,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=613568.6666666666, ans=0.1 2024-09-24 23:35:07,636 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.262e+02 1.340e+02 1.438e+02 2.197e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-24 23:35:12,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=613615.3333333334, ans=0.2 2024-09-24 23:35:14,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=613615.3333333334, ans=0.09899494936611666 2024-09-24 23:35:25,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=613615.3333333334, ans=0.0 2024-09-24 23:35:28,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=613662.0, ans=0.025 2024-09-24 23:35:58,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=613708.6666666666, ans=0.125 2024-09-24 23:36:04,803 INFO [train.py:1198] (3/4) Epoch 34, batch 2950, loss[loss=0.1874, ctc_loss=0.119, cr_loss=0.3418, over 16968.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.1273, cr_loss=0.3444, over 3357841.29 frames. ], batch size: 42, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:36:41,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=613802.0, ans=0.125 2024-09-24 23:37:21,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=613942.0, ans=0.1 2024-09-24 23:37:32,110 INFO [train.py:1198] (3/4) Epoch 34, batch 3000, loss[loss=0.1871, ctc_loss=0.1187, cr_loss=0.3419, over 17262.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.1272, cr_loss=0.3448, over 3359816.79 frames. ], batch size: 44, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:37:32,111 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-24 23:37:47,934 INFO [train.py:1230] (3/4) Epoch 34, validation: loss=0.03583, ctc_loss=0.03583, cr_loss=9.471e-15, over 944034.00 frames. 2024-09-24 23:37:47,934 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-24 23:38:05,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=614035.3333333334, ans=0.0 2024-09-24 23:38:12,760 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.290e+02 1.384e+02 1.469e+02 2.229e+02, threshold=2.767e+02, percent-clipped=0.0 2024-09-24 23:38:16,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=614035.3333333334, ans=0.2 2024-09-24 23:38:50,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=614175.3333333334, ans=0.125 2024-09-24 23:39:01,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=614175.3333333334, ans=0.125 2024-09-24 23:39:05,969 INFO [train.py:1198] (3/4) Epoch 34, batch 3050, loss[loss=0.1539, ctc_loss=0.09595, cr_loss=0.2899, over 17037.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1271, cr_loss=0.3437, over 3349429.24 frames. ], batch size: 39, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:39:09,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=614222.0, ans=0.125 2024-09-24 23:39:18,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=614222.0, ans=0.125 2024-09-24 23:39:39,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2024-09-24 23:39:45,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=614315.3333333334, ans=0.0 2024-09-24 23:40:02,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=614362.0, ans=0.1 2024-09-24 23:40:11,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=614408.6666666666, ans=0.0 2024-09-24 23:40:24,000 INFO [train.py:1198] (3/4) Epoch 34, batch 3100, loss[loss=0.2166, ctc_loss=0.1464, cr_loss=0.3512, over 11219.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1264, cr_loss=0.3426, over 3345834.02 frames. ], batch size: 123, lr: 3.49e-03, grad_scale: 16.0 2024-09-24 23:40:38,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2024-09-24 23:40:41,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=614502.0, ans=0.2 2024-09-24 23:40:50,476 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.258e+02 1.350e+02 1.404e+02 1.862e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-24 23:40:55,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=614548.6666666666, ans=0.125 2024-09-24 23:41:04,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=614548.6666666666, ans=0.2 2024-09-24 23:41:08,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=614548.6666666666, ans=0.125 2024-09-24 23:41:17,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=614595.3333333334, ans=0.0 2024-09-24 23:41:26,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=614642.0, ans=0.125 2024-09-24 23:41:32,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=614642.0, ans=0.1 2024-09-24 23:41:41,891 INFO [train.py:1198] (3/4) Epoch 34, batch 3150, loss[loss=0.1626, ctc_loss=0.1022, cr_loss=0.3021, over 17117.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1264, cr_loss=0.3428, over 3349985.42 frames. ], batch size: 40, lr: 3.48e-03, grad_scale: 16.0 2024-09-24 23:41:53,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=614688.6666666666, ans=0.1 2024-09-24 23:41:58,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-09-24 23:42:04,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=614735.3333333334, ans=0.125 2024-09-24 23:42:22,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.71 vs. limit=15.0 2024-09-24 23:42:23,459 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 23:42:35,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=614828.6666666666, ans=0.0 2024-09-24 23:42:35,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=614828.6666666666, ans=0.2 2024-09-24 23:42:38,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=614828.6666666666, ans=0.1 2024-09-24 23:42:43,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=614875.3333333334, ans=0.0 2024-09-24 23:42:46,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=614875.3333333334, ans=0.125 2024-09-24 23:42:59,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=614922.0, ans=0.125 2024-09-24 23:43:00,720 INFO [train.py:1198] (3/4) Epoch 34, batch 3200, loss[loss=0.1941, ctc_loss=0.1269, cr_loss=0.3362, over 17067.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1265, cr_loss=0.3429, over 3349925.39 frames. ], batch size: 46, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:43:00,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=614922.0, ans=0.2 2024-09-24 23:43:22,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=614968.6666666666, ans=0.0 2024-09-24 23:43:27,302 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.284e+02 1.377e+02 1.475e+02 3.177e+02, threshold=2.753e+02, percent-clipped=2.0 2024-09-24 23:43:55,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=615062.0, ans=0.125 2024-09-24 23:44:06,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=615108.6666666666, ans=0.125 2024-09-24 23:44:15,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2024-09-24 23:44:19,051 INFO [train.py:1198] (3/4) Epoch 34, batch 3250, loss[loss=0.1738, ctc_loss=0.1118, cr_loss=0.3099, over 17137.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1266, cr_loss=0.3432, over 3348060.03 frames. ], batch size: 48, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:45:07,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=615295.3333333334, ans=0.125 2024-09-24 23:45:21,722 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.99 vs. limit=15.0 2024-09-24 23:45:30,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-09-24 23:45:39,576 INFO [train.py:1198] (3/4) Epoch 34, batch 3300, loss[loss=0.1945, ctc_loss=0.1273, cr_loss=0.3359, over 17084.00 frames. ], tot_loss[loss=0.1967, ctc_loss=0.1277, cr_loss=0.3451, over 3349128.61 frames. ], batch size: 49, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:45:52,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=615388.6666666666, ans=0.1 2024-09-24 23:46:10,484 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.276e+02 1.373e+02 1.543e+02 3.468e+02, threshold=2.745e+02, percent-clipped=1.0 2024-09-24 23:46:13,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=615482.0, ans=0.0 2024-09-24 23:46:32,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=12.0 2024-09-24 23:46:36,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=615528.6666666666, ans=0.0 2024-09-24 23:46:51,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.94 vs. limit=22.5 2024-09-24 23:46:56,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=615575.3333333334, ans=0.0 2024-09-24 23:47:04,477 INFO [train.py:1198] (3/4) Epoch 34, batch 3350, loss[loss=0.2074, ctc_loss=0.136, cr_loss=0.3572, over 17025.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.1279, cr_loss=0.3457, over 3349737.00 frames. ], batch size: 51, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:47:15,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=615622.0, ans=0.025 2024-09-24 23:47:21,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=615668.6666666666, ans=0.2 2024-09-24 23:48:01,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=615762.0, ans=0.09899494936611666 2024-09-24 23:48:08,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=615808.6666666666, ans=0.1 2024-09-24 23:48:14,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=615808.6666666666, ans=0.0 2024-09-24 23:48:14,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=615808.6666666666, ans=0.125 2024-09-24 23:48:22,376 INFO [train.py:1198] (3/4) Epoch 34, batch 3400, loss[loss=0.1951, ctc_loss=0.1233, cr_loss=0.3592, over 17276.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1275, cr_loss=0.3451, over 3354358.00 frames. ], batch size: 44, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:48:22,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=615855.3333333334, ans=0.125 2024-09-24 23:48:38,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=615902.0, ans=0.2 2024-09-24 23:48:48,723 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.295e+02 1.404e+02 1.516e+02 2.292e+02, threshold=2.807e+02, percent-clipped=0.0 2024-09-24 23:49:25,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=616042.0, ans=0.125 2024-09-24 23:49:25,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-24 23:49:38,456 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.79 vs. limit=12.0 2024-09-24 23:49:42,039 INFO [train.py:1198] (3/4) Epoch 34, batch 3450, loss[loss=0.2338, ctc_loss=0.1508, cr_loss=0.4153, over 15108.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1266, cr_loss=0.3434, over 3356875.10 frames. ], batch size: 90, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:50:41,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=616228.6666666666, ans=0.2 2024-09-24 23:50:43,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616275.3333333334, ans=0.1 2024-09-24 23:50:45,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=616275.3333333334, ans=0.125 2024-09-24 23:50:45,338 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-09-24 23:50:54,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=616275.3333333334, ans=0.0 2024-09-24 23:51:00,312 INFO [train.py:1198] (3/4) Epoch 34, batch 3500, loss[loss=0.2021, ctc_loss=0.1317, cr_loss=0.3523, over 17043.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1263, cr_loss=0.3424, over 3346165.83 frames. ], batch size: 52, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:51:11,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=616322.0, ans=0.025 2024-09-24 23:51:26,844 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.270e+02 1.396e+02 1.524e+02 2.184e+02, threshold=2.793e+02, percent-clipped=0.0 2024-09-24 23:51:45,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=616462.0, ans=0.2 2024-09-24 23:51:58,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=616462.0, ans=0.05 2024-09-24 23:51:58,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=616462.0, ans=0.1 2024-09-24 23:52:11,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-09-24 23:52:18,497 INFO [train.py:1198] (3/4) Epoch 34, batch 3550, loss[loss=0.162, ctc_loss=0.1013, cr_loss=0.3033, over 17091.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1263, cr_loss=0.3425, over 3350274.13 frames. ], batch size: 43, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:52:31,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-24 23:52:34,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=616602.0, ans=0.07 2024-09-24 23:52:39,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=15.0 2024-09-24 23:52:40,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=616602.0, ans=0.0 2024-09-24 23:52:43,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=616602.0, ans=0.0 2024-09-24 23:52:45,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=616602.0, ans=0.125 2024-09-24 23:52:51,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=616648.6666666666, ans=0.0 2024-09-24 23:52:59,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=616648.6666666666, ans=0.125 2024-09-24 23:53:00,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=616648.6666666666, ans=0.125 2024-09-24 23:53:24,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=616742.0, ans=0.125 2024-09-24 23:53:31,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=616742.0, ans=0.0 2024-09-24 23:53:36,432 INFO [train.py:1198] (3/4) Epoch 34, batch 3600, loss[loss=0.1999, ctc_loss=0.1328, cr_loss=0.3351, over 15995.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1274, cr_loss=0.3446, over 3349046.05 frames. ], batch size: 74, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:54:03,052 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.287e+02 1.340e+02 1.449e+02 1.947e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-24 23:54:06,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=616882.0, ans=0.0 2024-09-24 23:54:08,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=616882.0, ans=0.025 2024-09-24 23:54:31,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=616928.6666666666, ans=0.2 2024-09-24 23:54:37,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=616928.6666666666, ans=0.125 2024-09-24 23:54:54,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=616975.3333333334, ans=0.125 2024-09-24 23:54:57,439 INFO [train.py:1198] (3/4) Epoch 34, batch 3650, loss[loss=0.2116, ctc_loss=0.1385, cr_loss=0.3654, over 16585.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.128, cr_loss=0.3454, over 3341660.64 frames. ], batch size: 66, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:54:59,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=617022.0, ans=0.2 2024-09-24 23:55:03,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=617022.0, ans=0.0 2024-09-24 23:55:30,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=617115.3333333334, ans=0.025 2024-09-24 23:55:34,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=617115.3333333334, ans=0.0 2024-09-24 23:55:44,387 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=22.5 2024-09-24 23:56:21,545 INFO [train.py:1198] (3/4) Epoch 34, batch 3700, loss[loss=0.2078, ctc_loss=0.1335, cr_loss=0.3715, over 17350.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1279, cr_loss=0.3453, over 3343419.14 frames. ], batch size: 48, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:56:29,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=617255.3333333334, ans=0.125 2024-09-24 23:56:48,018 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.259e+02 1.354e+02 1.435e+02 3.016e+02, threshold=2.708e+02, percent-clipped=2.0 2024-09-24 23:56:59,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=617348.6666666666, ans=0.125 2024-09-24 23:57:02,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=617348.6666666666, ans=0.0 2024-09-24 23:57:17,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=617395.3333333334, ans=0.1 2024-09-24 23:57:35,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=617442.0, ans=0.125 2024-09-24 23:57:36,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=617442.0, ans=0.0 2024-09-24 23:57:39,593 INFO [train.py:1198] (3/4) Epoch 34, batch 3750, loss[loss=0.2003, ctc_loss=0.131, cr_loss=0.3467, over 16550.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1277, cr_loss=0.3446, over 3336876.39 frames. ], batch size: 66, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:57:44,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=617488.6666666666, ans=0.125 2024-09-24 23:57:46,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-09-24 23:57:58,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=617535.3333333334, ans=0.1 2024-09-24 23:58:33,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=617628.6666666666, ans=0.125 2024-09-24 23:58:52,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=617675.3333333334, ans=0.2 2024-09-24 23:58:52,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=617675.3333333334, ans=0.0 2024-09-24 23:58:56,795 INFO [train.py:1198] (3/4) Epoch 34, batch 3800, loss[loss=0.2401, ctc_loss=0.1701, cr_loss=0.3501, over 12214.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1281, cr_loss=0.3438, over 3312635.79 frames. ], batch size: 124, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:59:23,295 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.281e+02 1.379e+02 1.537e+02 2.661e+02, threshold=2.757e+02, percent-clipped=0.0 2024-09-24 23:59:23,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2024-09-25 00:00:11,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=617908.6666666666, ans=0.5 2024-09-25 00:00:15,685 INFO [train.py:1198] (3/4) Epoch 34, batch 3850, loss[loss=0.1987, ctc_loss=0.1298, cr_loss=0.3445, over 16943.00 frames. ], tot_loss[loss=0.1981, ctc_loss=0.1291, cr_loss=0.3448, over 3274571.81 frames. ], batch size: 58, lr: 3.48e-03, grad_scale: 32.0 2024-09-25 00:00:30,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=618002.0, ans=0.0 2024-09-25 00:00:39,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=618002.0, ans=0.0 2024-09-25 00:00:52,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=618048.6666666666, ans=0.125 2024-09-25 00:00:54,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=618048.6666666666, ans=0.0 2024-09-25 00:00:55,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=618048.6666666666, ans=0.125 2024-09-25 00:00:55,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=618048.6666666666, ans=0.125 2024-09-25 00:01:04,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=618095.3333333334, ans=0.125 2024-09-25 00:01:14,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=618095.3333333334, ans=15.0 2024-09-25 00:01:15,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=618142.0, ans=0.1 2024-09-25 00:01:18,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=618142.0, ans=0.125 2024-09-25 00:02:16,958 INFO [train.py:1198] (3/4) Epoch 35, batch 0, loss[loss=0.2371, ctc_loss=0.1625, cr_loss=0.3729, over 11684.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1625, cr_loss=0.3729, over 11684.00 frames. ], batch size: 123, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:02:16,958 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 00:02:32,192 INFO [train.py:1230] (3/4) Epoch 35, validation: loss=0.03449, ctc_loss=0.03449, cr_loss=9.757e-15, over 944034.00 frames. 2024-09-25 00:02:32,193 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 00:03:07,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=618263.3333333334, ans=0.0 2024-09-25 00:03:10,088 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.401e+02 1.522e+02 1.667e+02 2.435e+02, threshold=3.044e+02, percent-clipped=0.0 2024-09-25 00:03:15,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=618263.3333333334, ans=0.125 2024-09-25 00:03:18,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=618263.3333333334, ans=0.125 2024-09-25 00:03:40,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=618356.6666666666, ans=0.025 2024-09-25 00:03:56,383 INFO [train.py:1198] (3/4) Epoch 35, batch 50, loss[loss=0.1822, ctc_loss=0.1175, cr_loss=0.3233, over 17213.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1276, cr_loss=0.3436, over 751873.10 frames. ], batch size: 47, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:04:03,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=618403.3333333334, ans=0.1 2024-09-25 00:04:04,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=618403.3333333334, ans=0.0 2024-09-25 00:04:10,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=22.5 2024-09-25 00:04:20,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=618450.0, ans=0.125 2024-09-25 00:04:41,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=618496.6666666666, ans=0.2 2024-09-25 00:04:53,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.84 vs. limit=22.5 2024-09-25 00:04:57,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=618543.3333333334, ans=0.04949747468305833 2024-09-25 00:05:02,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.71 vs. limit=15.0 2024-09-25 00:05:16,504 INFO [train.py:1198] (3/4) Epoch 35, batch 100, loss[loss=0.1759, ctc_loss=0.1119, cr_loss=0.3201, over 17275.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1266, cr_loss=0.3442, over 1336108.76 frames. ], batch size: 42, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:05:49,885 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.242e+02 1.304e+02 1.413e+02 1.730e+02, threshold=2.607e+02, percent-clipped=0.0 2024-09-25 00:05:59,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2024-09-25 00:06:01,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2024-09-25 00:06:12,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=618776.6666666666, ans=0.0 2024-09-25 00:06:19,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=618823.3333333334, ans=0.025 2024-09-25 00:06:24,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=618823.3333333334, ans=0.125 2024-09-25 00:06:29,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=618823.3333333334, ans=0.125 2024-09-25 00:06:38,904 INFO [train.py:1198] (3/4) Epoch 35, batch 150, loss[loss=0.2086, ctc_loss=0.1371, cr_loss=0.3577, over 17011.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.1267, cr_loss=0.3446, over 1784047.93 frames. ], batch size: 51, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:06:47,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.20 vs. limit=10.0 2024-09-25 00:07:03,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=618916.6666666666, ans=0.1 2024-09-25 00:07:09,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=618963.3333333334, ans=0.125 2024-09-25 00:07:10,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2024-09-25 00:07:19,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=618963.3333333334, ans=0.125 2024-09-25 00:07:21,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=618963.3333333334, ans=0.125 2024-09-25 00:07:28,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=619010.0, ans=0.125 2024-09-25 00:08:05,738 INFO [train.py:1198] (3/4) Epoch 35, batch 200, loss[loss=0.2013, ctc_loss=0.1305, cr_loss=0.354, over 15858.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1259, cr_loss=0.3431, over 2141770.73 frames. ], batch size: 74, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:08:11,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=619103.3333333334, ans=0.1 2024-09-25 00:08:17,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=619103.3333333334, ans=0.125 2024-09-25 00:08:34,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2024-09-25 00:08:41,112 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.252e+02 1.342e+02 1.492e+02 1.753e+02, threshold=2.683e+02, percent-clipped=0.0 2024-09-25 00:08:46,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=619196.6666666666, ans=0.0 2024-09-25 00:08:52,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=619196.6666666666, ans=0.1 2024-09-25 00:09:08,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=619243.3333333334, ans=0.2 2024-09-25 00:09:13,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.10 vs. limit=15.0 2024-09-25 00:09:27,867 INFO [train.py:1198] (3/4) Epoch 35, batch 250, loss[loss=0.1829, ctc_loss=0.1192, cr_loss=0.3185, over 17292.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1251, cr_loss=0.3409, over 2416051.61 frames. ], batch size: 49, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:09:28,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=619336.6666666666, ans=0.025 2024-09-25 00:09:59,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=619430.0, ans=0.1 2024-09-25 00:10:15,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=619476.6666666666, ans=0.07 2024-09-25 00:10:23,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.35 vs. limit=15.0 2024-09-25 00:10:28,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=619476.6666666666, ans=0.0 2024-09-25 00:10:29,015 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.97 vs. limit=10.0 2024-09-25 00:10:35,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=619523.3333333334, ans=0.2 2024-09-25 00:10:47,177 INFO [train.py:1198] (3/4) Epoch 35, batch 300, loss[loss=0.2231, ctc_loss=0.1477, cr_loss=0.377, over 17365.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1262, cr_loss=0.3429, over 2616039.07 frames. ], batch size: 48, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:11:08,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=619616.6666666666, ans=0.0 2024-09-25 00:11:08,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=619616.6666666666, ans=0.125 2024-09-25 00:11:20,985 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.276e+02 1.333e+02 1.416e+02 3.334e+02, threshold=2.666e+02, percent-clipped=1.0 2024-09-25 00:11:47,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2024-09-25 00:11:50,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=619710.0, ans=0.0 2024-09-25 00:11:56,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=619756.6666666666, ans=0.1 2024-09-25 00:11:59,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=619756.6666666666, ans=0.5 2024-09-25 00:12:10,251 INFO [train.py:1198] (3/4) Epoch 35, batch 350, loss[loss=0.2314, ctc_loss=0.1598, cr_loss=0.3581, over 11465.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1274, cr_loss=0.3452, over 2775203.44 frames. ], batch size: 123, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:12:36,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=619850.0, ans=0.025 2024-09-25 00:12:44,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=619896.6666666666, ans=0.125 2024-09-25 00:12:45,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=619896.6666666666, ans=0.5 2024-09-25 00:12:59,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=619896.6666666666, ans=0.125 2024-09-25 00:13:20,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.06 vs. limit=10.0 2024-09-25 00:13:39,216 INFO [train.py:1198] (3/4) Epoch 35, batch 400, loss[loss=0.1721, ctc_loss=0.11, cr_loss=0.3104, over 17251.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.1269, cr_loss=0.3442, over 2915836.83 frames. ], batch size: 42, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:14:03,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=620083.3333333334, ans=0.0 2024-09-25 00:14:12,441 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.297e+02 1.357e+02 1.460e+02 2.001e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-25 00:14:15,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=620130.0, ans=0.0 2024-09-25 00:14:29,620 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2024-09-25 00:14:40,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=620176.6666666666, ans=0.125 2024-09-25 00:14:59,149 INFO [train.py:1198] (3/4) Epoch 35, batch 450, loss[loss=0.2077, ctc_loss=0.1351, cr_loss=0.3632, over 16898.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1265, cr_loss=0.3437, over 3023046.33 frames. ], batch size: 58, lr: 3.42e-03, grad_scale: 16.0 2024-09-25 00:15:10,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=620270.0, ans=0.125 2024-09-25 00:15:40,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2024-09-25 00:16:01,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=620456.6666666666, ans=0.125 2024-09-25 00:16:03,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=620456.6666666666, ans=0.125 2024-09-25 00:16:08,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=620456.6666666666, ans=0.125 2024-09-25 00:16:10,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=620456.6666666666, ans=0.125 2024-09-25 00:16:19,363 INFO [train.py:1198] (3/4) Epoch 35, batch 500, loss[loss=0.1933, ctc_loss=0.1244, cr_loss=0.3445, over 17221.00 frames. ], tot_loss[loss=0.1943, ctc_loss=0.1258, cr_loss=0.3423, over 3105010.98 frames. ], batch size: 47, lr: 3.42e-03, grad_scale: 16.0 2024-09-25 00:16:34,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=620503.3333333334, ans=0.125 2024-09-25 00:16:38,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.72 vs. limit=12.0 2024-09-25 00:16:41,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.28 vs. limit=10.0 2024-09-25 00:16:57,098 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.242e+02 1.333e+02 1.438e+02 2.516e+02, threshold=2.666e+02, percent-clipped=0.0 2024-09-25 00:17:15,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=620643.3333333334, ans=0.125 2024-09-25 00:17:26,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=620690.0, ans=0.125 2024-09-25 00:17:44,912 INFO [train.py:1198] (3/4) Epoch 35, batch 550, loss[loss=0.1502, ctc_loss=0.09215, cr_loss=0.2903, over 17025.00 frames. ], tot_loss[loss=0.1943, ctc_loss=0.1257, cr_loss=0.3428, over 3169109.08 frames. ], batch size: 39, lr: 3.42e-03, grad_scale: 16.0 2024-09-25 00:17:45,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=620736.6666666666, ans=0.125 2024-09-25 00:17:51,929 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=12.0 2024-09-25 00:17:53,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=620736.6666666666, ans=0.125 2024-09-25 00:18:08,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2024-09-25 00:18:51,722 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=22.5 2024-09-25 00:19:10,363 INFO [train.py:1198] (3/4) Epoch 35, batch 600, loss[loss=0.209, ctc_loss=0.1365, cr_loss=0.3627, over 17143.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1261, cr_loss=0.3436, over 3210713.80 frames. ], batch size: 48, lr: 3.42e-03, grad_scale: 16.0 2024-09-25 00:19:23,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=620970.0, ans=0.125 2024-09-25 00:19:28,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=621016.6666666666, ans=0.125 2024-09-25 00:19:41,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=621063.3333333334, ans=0.04949747468305833 2024-09-25 00:19:45,442 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.279e+02 1.403e+02 1.511e+02 1.952e+02, threshold=2.806e+02, percent-clipped=0.0 2024-09-25 00:19:58,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=621110.0, ans=0.125 2024-09-25 00:20:05,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=22.5 2024-09-25 00:20:09,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=621110.0, ans=0.125 2024-09-25 00:20:20,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=621156.6666666666, ans=0.125 2024-09-25 00:20:28,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=621203.3333333334, ans=0.2 2024-09-25 00:20:30,169 INFO [train.py:1198] (3/4) Epoch 35, batch 650, loss[loss=0.2079, ctc_loss=0.1364, cr_loss=0.3579, over 16211.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.126, cr_loss=0.3433, over 3247007.04 frames. ], batch size: 74, lr: 3.42e-03, grad_scale: 16.0 2024-09-25 00:20:59,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=621250.0, ans=0.025 2024-09-25 00:21:51,756 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 00:21:52,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2024-09-25 00:21:53,029 INFO [train.py:1198] (3/4) Epoch 35, batch 700, loss[loss=0.1878, ctc_loss=0.1225, cr_loss=0.3262, over 17017.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1262, cr_loss=0.3432, over 3276919.37 frames. ], batch size: 51, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:22:30,889 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.276e+02 1.349e+02 1.443e+02 1.700e+02, threshold=2.699e+02, percent-clipped=0.0 2024-09-25 00:22:31,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621530.0, ans=0.1 2024-09-25 00:22:59,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=621576.6666666666, ans=0.125 2024-09-25 00:23:01,554 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=22.5 2024-09-25 00:23:15,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=621623.3333333334, ans=0.125 2024-09-25 00:23:20,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=22.5 2024-09-25 00:23:21,269 INFO [train.py:1198] (3/4) Epoch 35, batch 750, loss[loss=0.2018, ctc_loss=0.1301, cr_loss=0.3587, over 17011.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.126, cr_loss=0.3421, over 3285459.09 frames. ], batch size: 51, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:23:26,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2024-09-25 00:23:31,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=621670.0, ans=0.0 2024-09-25 00:24:39,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.54 vs. limit=5.0 2024-09-25 00:24:40,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=621903.3333333334, ans=0.125 2024-09-25 00:24:40,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=621903.3333333334, ans=0.125 2024-09-25 00:24:41,688 INFO [train.py:1198] (3/4) Epoch 35, batch 800, loss[loss=0.2115, ctc_loss=0.14, cr_loss=0.3572, over 16988.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.1259, cr_loss=0.3424, over 3312432.53 frames. ], batch size: 56, lr: 3.41e-03, grad_scale: 32.0 2024-09-25 00:24:42,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=621903.3333333334, ans=0.2 2024-09-25 00:25:00,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=22.5 2024-09-25 00:25:04,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=621950.0, ans=0.125 2024-09-25 00:25:07,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=621950.0, ans=0.125 2024-09-25 00:25:09,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=621950.0, ans=0.125 2024-09-25 00:25:10,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.71 vs. limit=15.0 2024-09-25 00:25:16,940 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.280e+02 1.366e+02 1.481e+02 2.443e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-25 00:25:17,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=621996.6666666666, ans=0.1 2024-09-25 00:25:23,776 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 00:25:33,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=622043.3333333334, ans=0.0 2024-09-25 00:25:59,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=622090.0, ans=0.0 2024-09-25 00:26:01,921 INFO [train.py:1198] (3/4) Epoch 35, batch 850, loss[loss=0.2227, ctc_loss=0.1459, cr_loss=0.384, over 16077.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1257, cr_loss=0.3418, over 3329769.95 frames. ], batch size: 74, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:26:48,351 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-09-25 00:26:51,910 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2024-09-25 00:27:05,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=622276.6666666666, ans=0.1 2024-09-25 00:27:16,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=622323.3333333334, ans=0.0 2024-09-25 00:27:26,639 INFO [train.py:1198] (3/4) Epoch 35, batch 900, loss[loss=0.2024, ctc_loss=0.1305, cr_loss=0.3595, over 17162.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1262, cr_loss=0.3427, over 3336910.73 frames. ], batch size: 41, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:27:44,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=622416.6666666666, ans=0.0 2024-09-25 00:28:08,746 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.305e+02 1.377e+02 1.455e+02 1.762e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-25 00:28:13,907 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 00:28:17,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=15.0 2024-09-25 00:28:20,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=622510.0, ans=0.125 2024-09-25 00:28:23,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=622510.0, ans=0.0 2024-09-25 00:28:26,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2024-09-25 00:28:28,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=622510.0, ans=0.125 2024-09-25 00:28:46,718 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=15.0 2024-09-25 00:28:52,255 INFO [train.py:1198] (3/4) Epoch 35, batch 950, loss[loss=0.1665, ctc_loss=0.107, cr_loss=0.2977, over 17190.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1263, cr_loss=0.343, over 3340875.28 frames. ], batch size: 41, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:28:57,722 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-09-25 00:29:03,064 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-25 00:29:13,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=622650.0, ans=0.1 2024-09-25 00:29:34,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=622696.6666666666, ans=0.05 2024-09-25 00:30:12,544 INFO [train.py:1198] (3/4) Epoch 35, batch 1000, loss[loss=0.156, ctc_loss=0.09964, cr_loss=0.2819, over 16994.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1275, cr_loss=0.3461, over 3347709.76 frames. ], batch size: 39, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:30:18,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2024-09-25 00:30:30,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=622883.3333333334, ans=0.1 2024-09-25 00:30:36,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=622883.3333333334, ans=0.125 2024-09-25 00:30:37,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.32 vs. limit=15.0 2024-09-25 00:30:41,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.81 vs. limit=15.0 2024-09-25 00:30:49,042 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.266e+02 1.340e+02 1.444e+02 2.744e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-25 00:30:52,894 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.43 vs. limit=10.0 2024-09-25 00:31:35,004 INFO [train.py:1198] (3/4) Epoch 35, batch 1050, loss[loss=0.1762, ctc_loss=0.1122, cr_loss=0.3202, over 17265.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1269, cr_loss=0.3447, over 3354031.18 frames. ], batch size: 44, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:31:51,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=623116.6666666666, ans=0.0 2024-09-25 00:32:05,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=623163.3333333334, ans=0.0 2024-09-25 00:32:16,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=623163.3333333334, ans=0.2 2024-09-25 00:32:33,911 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 00:32:45,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=623256.6666666666, ans=0.2 2024-09-25 00:32:49,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=623256.6666666666, ans=0.125 2024-09-25 00:33:02,769 INFO [train.py:1198] (3/4) Epoch 35, batch 1100, loss[loss=0.2295, ctc_loss=0.148, cr_loss=0.4077, over 16595.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1268, cr_loss=0.3448, over 3346751.82 frames. ], batch size: 66, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:33:20,573 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 00:33:28,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=623350.0, ans=0.125 2024-09-25 00:33:39,207 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.286e+02 1.347e+02 1.466e+02 2.468e+02, threshold=2.694e+02, percent-clipped=0.0 2024-09-25 00:33:39,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=623396.6666666666, ans=0.125 2024-09-25 00:33:55,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=623443.3333333334, ans=0.2 2024-09-25 00:34:03,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=623443.3333333334, ans=0.125 2024-09-25 00:34:15,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-09-25 00:34:22,933 INFO [train.py:1198] (3/4) Epoch 35, batch 1150, loss[loss=0.2014, ctc_loss=0.1306, cr_loss=0.3541, over 17277.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.126, cr_loss=0.3431, over 3353937.61 frames. ], batch size: 49, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:34:50,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2024-09-25 00:34:51,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=623583.3333333334, ans=0.125 2024-09-25 00:35:07,739 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 00:35:36,733 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.27 vs. limit=15.0 2024-09-25 00:35:42,224 INFO [train.py:1198] (3/4) Epoch 35, batch 1200, loss[loss=0.2586, ctc_loss=0.1753, cr_loss=0.4165, over 11681.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1261, cr_loss=0.3433, over 3342910.49 frames. ], batch size: 123, lr: 3.41e-03, grad_scale: 32.0 2024-09-25 00:36:09,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=623816.6666666666, ans=0.0 2024-09-25 00:36:16,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=15.0 2024-09-25 00:36:18,955 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.278e+02 1.375e+02 1.481e+02 3.565e+02, threshold=2.751e+02, percent-clipped=1.0 2024-09-25 00:36:28,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=623910.0, ans=0.025 2024-09-25 00:36:47,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=623956.6666666666, ans=0.125 2024-09-25 00:36:57,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.92 vs. limit=15.0 2024-09-25 00:37:04,861 INFO [train.py:1198] (3/4) Epoch 35, batch 1250, loss[loss=0.1905, ctc_loss=0.1235, cr_loss=0.3352, over 16864.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1259, cr_loss=0.3433, over 3352750.72 frames. ], batch size: 58, lr: 3.41e-03, grad_scale: 32.0 2024-09-25 00:37:25,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=624050.0, ans=0.07 2024-09-25 00:37:44,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=624096.6666666666, ans=0.0 2024-09-25 00:37:56,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=624143.3333333334, ans=0.2 2024-09-25 00:38:01,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=624143.3333333334, ans=6.0 2024-09-25 00:38:13,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.54 vs. limit=15.0 2024-09-25 00:38:32,165 INFO [train.py:1198] (3/4) Epoch 35, batch 1300, loss[loss=0.1608, ctc_loss=0.09999, cr_loss=0.3041, over 17072.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1262, cr_loss=0.3436, over 3350401.03 frames. ], batch size: 43, lr: 3.41e-03, grad_scale: 32.0 2024-09-25 00:38:32,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=624236.6666666666, ans=0.0 2024-09-25 00:38:41,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=624236.6666666666, ans=0.0 2024-09-25 00:38:45,675 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2024-09-25 00:38:59,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=624283.3333333334, ans=0.125 2024-09-25 00:39:02,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=624330.0, ans=0.0 2024-09-25 00:39:08,851 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.267e+02 1.378e+02 1.452e+02 1.774e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-25 00:39:10,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=624330.0, ans=0.035 2024-09-25 00:39:14,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=624330.0, ans=0.2 2024-09-25 00:39:18,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=624376.6666666666, ans=0.04949747468305833 2024-09-25 00:39:37,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-25 00:39:52,136 INFO [train.py:1198] (3/4) Epoch 35, batch 1350, loss[loss=0.1856, ctc_loss=0.119, cr_loss=0.3327, over 17231.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1266, cr_loss=0.3437, over 3362403.78 frames. ], batch size: 47, lr: 3.41e-03, grad_scale: 32.0 2024-09-25 00:40:15,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=624516.6666666666, ans=0.125 2024-09-25 00:40:26,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=624563.3333333334, ans=0.125 2024-09-25 00:40:48,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=624610.0, ans=0.025 2024-09-25 00:41:11,915 INFO [train.py:1198] (3/4) Epoch 35, batch 1400, loss[loss=0.2312, ctc_loss=0.1585, cr_loss=0.3634, over 11480.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1269, cr_loss=0.3443, over 3354030.73 frames. ], batch size: 123, lr: 3.41e-03, grad_scale: 32.0 2024-09-25 00:41:17,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=624703.3333333334, ans=0.2 2024-09-25 00:41:21,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=624703.3333333334, ans=0.2 2024-09-25 00:41:25,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=624703.3333333334, ans=0.125 2024-09-25 00:41:51,232 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.292e+02 1.373e+02 1.479e+02 2.692e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-25 00:41:54,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=624796.6666666666, ans=0.2 2024-09-25 00:41:56,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=624796.6666666666, ans=0.125 2024-09-25 00:41:57,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=624796.6666666666, ans=0.125 2024-09-25 00:42:27,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=624890.0, ans=0.125 2024-09-25 00:42:37,118 INFO [train.py:1198] (3/4) Epoch 35, batch 1450, loss[loss=0.2077, ctc_loss=0.1359, cr_loss=0.3593, over 17278.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.126, cr_loss=0.3425, over 3358177.05 frames. ], batch size: 46, lr: 3.41e-03, grad_scale: 32.0 2024-09-25 00:43:04,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=624983.3333333334, ans=0.125 2024-09-25 00:43:13,536 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2024-09-25 00:43:17,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=625030.0, ans=0.125 2024-09-25 00:43:28,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.92 vs. limit=8.0 2024-09-25 00:43:35,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=625076.6666666666, ans=0.2 2024-09-25 00:44:02,343 INFO [train.py:1198] (3/4) Epoch 35, batch 1500, loss[loss=0.1798, ctc_loss=0.1165, cr_loss=0.3166, over 17256.00 frames. ], tot_loss[loss=0.1942, ctc_loss=0.1258, cr_loss=0.3421, over 3367779.93 frames. ], batch size: 44, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:44:21,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=625216.6666666666, ans=0.125 2024-09-25 00:44:39,192 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.249e+02 1.364e+02 1.434e+02 2.230e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-25 00:44:52,736 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-09-25 00:45:08,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=625356.6666666666, ans=0.125 2024-09-25 00:45:23,012 INFO [train.py:1198] (3/4) Epoch 35, batch 1550, loss[loss=0.1917, ctc_loss=0.1234, cr_loss=0.3414, over 17048.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.126, cr_loss=0.3429, over 3369548.83 frames. ], batch size: 52, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:45:29,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=625403.3333333334, ans=0.125 2024-09-25 00:45:40,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625450.0, ans=0.1 2024-09-25 00:45:52,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=625450.0, ans=0.125 2024-09-25 00:46:07,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=625496.6666666666, ans=15.0 2024-09-25 00:46:12,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=625543.3333333334, ans=15.0 2024-09-25 00:46:16,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=625543.3333333334, ans=6.0 2024-09-25 00:46:29,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625590.0, ans=0.1 2024-09-25 00:46:38,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=625590.0, ans=10.0 2024-09-25 00:46:45,800 INFO [train.py:1198] (3/4) Epoch 35, batch 1600, loss[loss=0.1884, ctc_loss=0.1202, cr_loss=0.3413, over 17178.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.125, cr_loss=0.3416, over 3376575.40 frames. ], batch size: 45, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:46:52,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=625636.6666666666, ans=0.035 2024-09-25 00:47:03,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625683.3333333334, ans=0.1 2024-09-25 00:47:03,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=625683.3333333334, ans=0.125 2024-09-25 00:47:08,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=625683.3333333334, ans=0.125 2024-09-25 00:47:25,308 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.273e+02 1.348e+02 1.440e+02 1.761e+02, threshold=2.695e+02, percent-clipped=0.0 2024-09-25 00:47:25,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=625730.0, ans=0.1 2024-09-25 00:47:32,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625730.0, ans=0.1 2024-09-25 00:47:43,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=625776.6666666666, ans=0.025 2024-09-25 00:48:13,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=625870.0, ans=0.0 2024-09-25 00:48:14,377 INFO [train.py:1198] (3/4) Epoch 35, batch 1650, loss[loss=0.1919, ctc_loss=0.1239, cr_loss=0.3403, over 17161.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1246, cr_loss=0.3407, over 3373613.58 frames. ], batch size: 45, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:48:17,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=625870.0, ans=0.125 2024-09-25 00:48:30,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=625916.6666666666, ans=0.125 2024-09-25 00:48:41,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=625916.6666666666, ans=0.0 2024-09-25 00:48:45,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=625963.3333333334, ans=0.125 2024-09-25 00:48:47,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=625963.3333333334, ans=15.0 2024-09-25 00:49:34,599 INFO [train.py:1198] (3/4) Epoch 35, batch 1700, loss[loss=0.2273, ctc_loss=0.15, cr_loss=0.3863, over 16538.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1246, cr_loss=0.3404, over 3379566.53 frames. ], batch size: 66, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:49:44,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.76 vs. limit=15.0 2024-09-25 00:50:01,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=626150.0, ans=0.2 2024-09-25 00:50:10,954 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.273e+02 1.359e+02 1.469e+02 2.264e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 00:50:11,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=626196.6666666666, ans=0.125 2024-09-25 00:50:22,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=626243.3333333334, ans=0.0 2024-09-25 00:50:27,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=626243.3333333334, ans=0.125 2024-09-25 00:50:27,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=626243.3333333334, ans=0.0 2024-09-25 00:50:54,472 INFO [train.py:1198] (3/4) Epoch 35, batch 1750, loss[loss=0.1737, ctc_loss=0.1107, cr_loss=0.3152, over 17167.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1247, cr_loss=0.3403, over 3381047.09 frames. ], batch size: 45, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:51:07,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=626336.6666666666, ans=0.0 2024-09-25 00:51:21,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=626383.3333333334, ans=0.125 2024-09-25 00:51:21,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626383.3333333334, ans=0.1 2024-09-25 00:51:40,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=626430.0, ans=0.125 2024-09-25 00:51:56,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.47 vs. limit=22.5 2024-09-25 00:52:19,239 INFO [train.py:1198] (3/4) Epoch 35, batch 1800, loss[loss=0.1896, ctc_loss=0.1209, cr_loss=0.3436, over 17175.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.1259, cr_loss=0.3425, over 3378066.24 frames. ], batch size: 45, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:52:40,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.54 vs. limit=12.0 2024-09-25 00:52:42,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.28 vs. limit=6.0 2024-09-25 00:52:48,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=626616.6666666666, ans=0.125 2024-09-25 00:53:00,986 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.248e+02 1.342e+02 1.447e+02 1.901e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-25 00:53:02,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=626663.3333333334, ans=0.2 2024-09-25 00:53:17,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=626710.0, ans=0.0 2024-09-25 00:53:34,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=626756.6666666666, ans=0.125 2024-09-25 00:53:38,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=626756.6666666666, ans=15.0 2024-09-25 00:53:44,286 INFO [train.py:1198] (3/4) Epoch 35, batch 1850, loss[loss=0.2182, ctc_loss=0.1434, cr_loss=0.3742, over 17135.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.126, cr_loss=0.3425, over 3373800.08 frames. ], batch size: 48, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:53:52,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=626803.3333333334, ans=0.125 2024-09-25 00:53:57,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=626803.3333333334, ans=0.0 2024-09-25 00:54:08,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.79 vs. limit=10.0 2024-09-25 00:54:11,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=626850.0, ans=0.125 2024-09-25 00:55:01,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=626990.0, ans=0.0 2024-09-25 00:55:04,162 INFO [train.py:1198] (3/4) Epoch 35, batch 1900, loss[loss=0.2004, ctc_loss=0.1338, cr_loss=0.3333, over 16889.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1269, cr_loss=0.3433, over 3363494.24 frames. ], batch size: 58, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:55:17,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=627036.6666666666, ans=0.0 2024-09-25 00:55:39,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=627130.0, ans=0.0 2024-09-25 00:55:41,025 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.172e+02 1.333e+02 1.409e+02 1.508e+02 2.528e+02, threshold=2.819e+02, percent-clipped=0.0 2024-09-25 00:55:46,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.79 vs. limit=15.0 2024-09-25 00:55:47,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=627130.0, ans=0.1 2024-09-25 00:56:15,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=627223.3333333334, ans=0.125 2024-09-25 00:56:23,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.07 vs. limit=10.0 2024-09-25 00:56:24,335 INFO [train.py:1198] (3/4) Epoch 35, batch 1950, loss[loss=0.2288, ctc_loss=0.1494, cr_loss=0.3972, over 17041.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1274, cr_loss=0.345, over 3372396.55 frames. ], batch size: 52, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:56:41,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=627316.6666666666, ans=0.2 2024-09-25 00:57:00,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=627363.3333333334, ans=0.125 2024-09-25 00:57:19,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=627410.0, ans=0.1 2024-09-25 00:57:30,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=627410.0, ans=0.125 2024-09-25 00:57:36,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=627456.6666666666, ans=0.1 2024-09-25 00:57:38,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=627456.6666666666, ans=0.125 2024-09-25 00:57:45,550 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.92 vs. limit=10.0 2024-09-25 00:57:49,088 INFO [train.py:1198] (3/4) Epoch 35, batch 2000, loss[loss=0.1906, ctc_loss=0.1216, cr_loss=0.345, over 17078.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1264, cr_loss=0.343, over 3373874.56 frames. ], batch size: 46, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:58:13,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=627550.0, ans=0.125 2024-09-25 00:58:30,772 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.295e+02 1.390e+02 1.531e+02 2.683e+02, threshold=2.780e+02, percent-clipped=0.0 2024-09-25 00:58:32,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=627596.6666666666, ans=0.125 2024-09-25 00:59:13,925 INFO [train.py:1198] (3/4) Epoch 35, batch 2050, loss[loss=0.2309, ctc_loss=0.1492, cr_loss=0.4082, over 16760.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.126, cr_loss=0.3427, over 3373039.43 frames. ], batch size: 61, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:59:14,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=627736.6666666666, ans=0.0 2024-09-25 00:59:14,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2024-09-25 00:59:51,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=627830.0, ans=0.125 2024-09-25 01:00:33,308 INFO [train.py:1198] (3/4) Epoch 35, batch 2100, loss[loss=0.1753, ctc_loss=0.1101, cr_loss=0.3259, over 17158.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1265, cr_loss=0.3439, over 3363032.58 frames. ], batch size: 45, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 01:00:37,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-09-25 01:00:40,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=627970.0, ans=0.09899494936611666 2024-09-25 01:00:50,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=628016.6666666666, ans=0.1 2024-09-25 01:01:00,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2024-09-25 01:01:10,509 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.271e+02 1.342e+02 1.455e+02 1.760e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-25 01:01:43,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628156.6666666666, ans=0.1 2024-09-25 01:01:47,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=628156.6666666666, ans=0.0 2024-09-25 01:01:56,293 INFO [train.py:1198] (3/4) Epoch 35, batch 2150, loss[loss=0.1706, ctc_loss=0.1109, cr_loss=0.2983, over 17083.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1263, cr_loss=0.3432, over 3354634.55 frames. ], batch size: 43, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 01:02:01,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.84 vs. limit=10.0 2024-09-25 01:02:21,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628250.0, ans=0.1 2024-09-25 01:02:42,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=628296.6666666666, ans=0.125 2024-09-25 01:02:43,462 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.20 vs. limit=10.0 2024-09-25 01:02:54,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=628343.3333333334, ans=0.2 2024-09-25 01:03:01,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2024-09-25 01:03:24,296 INFO [train.py:1198] (3/4) Epoch 35, batch 2200, loss[loss=0.1915, ctc_loss=0.1223, cr_loss=0.3458, over 17033.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1266, cr_loss=0.3438, over 3349894.51 frames. ], batch size: 56, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 01:03:32,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628436.6666666666, ans=0.1 2024-09-25 01:03:52,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-09-25 01:04:01,427 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.291e+02 1.359e+02 1.454e+02 2.045e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 01:04:45,061 INFO [train.py:1198] (3/4) Epoch 35, batch 2250, loss[loss=0.213, ctc_loss=0.1355, cr_loss=0.3876, over 17287.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1261, cr_loss=0.3426, over 3346549.39 frames. ], batch size: 46, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 01:05:06,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=628716.6666666666, ans=0.0 2024-09-25 01:05:28,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=628763.3333333334, ans=0.07 2024-09-25 01:05:36,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=628810.0, ans=0.2 2024-09-25 01:06:05,298 INFO [train.py:1198] (3/4) Epoch 35, batch 2300, loss[loss=0.2017, ctc_loss=0.1315, cr_loss=0.3511, over 16752.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1268, cr_loss=0.3436, over 3340280.87 frames. ], batch size: 61, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:06:38,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=628996.6666666666, ans=0.025 2024-09-25 01:06:39,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=628996.6666666666, ans=0.125 2024-09-25 01:06:44,417 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.311e+02 1.390e+02 1.542e+02 2.527e+02, threshold=2.780e+02, percent-clipped=0.0 2024-09-25 01:06:51,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=628996.6666666666, ans=0.0 2024-09-25 01:06:52,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=628996.6666666666, ans=0.0 2024-09-25 01:06:54,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=629043.3333333334, ans=0.2 2024-09-25 01:07:30,084 INFO [train.py:1198] (3/4) Epoch 35, batch 2350, loss[loss=0.1908, ctc_loss=0.1211, cr_loss=0.3486, over 17026.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1274, cr_loss=0.3452, over 3342377.52 frames. ], batch size: 51, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:07:41,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=629136.6666666666, ans=0.0 2024-09-25 01:07:50,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=629183.3333333334, ans=0.125 2024-09-25 01:08:30,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=629276.6666666666, ans=0.125 2024-09-25 01:08:34,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=629276.6666666666, ans=0.0 2024-09-25 01:08:50,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.16 vs. limit=10.0 2024-09-25 01:08:55,603 INFO [train.py:1198] (3/4) Epoch 35, batch 2400, loss[loss=0.2319, ctc_loss=0.153, cr_loss=0.3946, over 14963.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.127, cr_loss=0.3447, over 3344490.99 frames. ], batch size: 89, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:08:55,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=629370.0, ans=0.0 2024-09-25 01:09:00,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=629370.0, ans=0.02 2024-09-25 01:09:24,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=629416.6666666666, ans=0.0 2024-09-25 01:09:32,456 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.057e+02 1.256e+02 1.334e+02 1.415e+02 2.339e+02, threshold=2.669e+02, percent-clipped=0.0 2024-09-25 01:09:51,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=629510.0, ans=0.0 2024-09-25 01:10:15,373 INFO [train.py:1198] (3/4) Epoch 35, batch 2450, loss[loss=0.1796, ctc_loss=0.1123, cr_loss=0.3366, over 17181.00 frames. ], tot_loss[loss=0.1961, ctc_loss=0.1272, cr_loss=0.3449, over 3348876.94 frames. ], batch size: 41, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:10:48,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.37 vs. limit=10.0 2024-09-25 01:10:49,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=629696.6666666666, ans=0.2 2024-09-25 01:10:52,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=629696.6666666666, ans=0.125 2024-09-25 01:11:10,239 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-09-25 01:11:18,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=12.0 2024-09-25 01:11:36,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=629836.6666666666, ans=0.125 2024-09-25 01:11:36,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=629836.6666666666, ans=15.0 2024-09-25 01:11:37,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.97 vs. limit=10.0 2024-09-25 01:11:37,757 INFO [train.py:1198] (3/4) Epoch 35, batch 2500, loss[loss=0.1863, ctc_loss=0.1212, cr_loss=0.3256, over 17298.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1274, cr_loss=0.3446, over 3347079.66 frames. ], batch size: 49, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:12:16,800 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.245e+02 1.351e+02 1.450e+02 1.965e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-25 01:12:26,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=629976.6666666666, ans=0.2 2024-09-25 01:13:05,482 INFO [train.py:1198] (3/4) Epoch 35, batch 2550, loss[loss=0.2231, ctc_loss=0.1497, cr_loss=0.3668, over 17120.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1271, cr_loss=0.3436, over 3327603.15 frames. ], batch size: 49, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:13:26,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=630116.6666666666, ans=0.125 2024-09-25 01:13:51,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=630210.0, ans=0.125 2024-09-25 01:13:51,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=630210.0, ans=0.125 2024-09-25 01:14:13,300 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-25 01:14:25,411 INFO [train.py:1198] (3/4) Epoch 35, batch 2600, loss[loss=0.1963, ctc_loss=0.1273, cr_loss=0.3446, over 17308.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1268, cr_loss=0.343, over 3325849.70 frames. ], batch size: 51, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:14:25,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=630303.3333333334, ans=0.125 2024-09-25 01:14:25,888 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 01:14:44,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=630350.0, ans=0.125 2024-09-25 01:14:51,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=630350.0, ans=0.125 2024-09-25 01:15:02,358 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.261e+02 1.318e+02 1.420e+02 1.911e+02, threshold=2.636e+02, percent-clipped=0.0 2024-09-25 01:15:13,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=630443.3333333334, ans=0.125 2024-09-25 01:15:19,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.18 vs. limit=15.0 2024-09-25 01:15:31,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.82 vs. limit=10.0 2024-09-25 01:15:45,580 INFO [train.py:1198] (3/4) Epoch 35, batch 2650, loss[loss=0.1982, ctc_loss=0.1279, cr_loss=0.3515, over 17215.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1257, cr_loss=0.3414, over 3326997.47 frames. ], batch size: 50, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:15:49,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=630536.6666666666, ans=0.0 2024-09-25 01:15:55,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2024-09-25 01:15:56,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=630536.6666666666, ans=0.1 2024-09-25 01:16:03,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=630583.3333333334, ans=0.125 2024-09-25 01:16:19,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=630630.0, ans=0.1 2024-09-25 01:16:37,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.82 vs. limit=10.0 2024-09-25 01:17:08,145 INFO [train.py:1198] (3/4) Epoch 35, batch 2700, loss[loss=0.1935, ctc_loss=0.1247, cr_loss=0.3441, over 17305.00 frames. ], tot_loss[loss=0.1938, ctc_loss=0.1255, cr_loss=0.3414, over 3339662.76 frames. ], batch size: 49, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:17:18,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=630770.0, ans=0.125 2024-09-25 01:17:27,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.60 vs. limit=15.0 2024-09-25 01:17:42,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=630816.6666666666, ans=0.2 2024-09-25 01:17:43,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=630863.3333333334, ans=0.0 2024-09-25 01:17:52,694 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.276e+02 1.350e+02 1.459e+02 1.855e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-25 01:18:07,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=630910.0, ans=0.1 2024-09-25 01:18:09,444 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2024-09-25 01:18:27,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=630956.6666666666, ans=0.0 2024-09-25 01:18:36,128 INFO [train.py:1198] (3/4) Epoch 35, batch 2750, loss[loss=0.1993, ctc_loss=0.1295, cr_loss=0.3492, over 16985.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1256, cr_loss=0.3416, over 3350179.55 frames. ], batch size: 51, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:18:38,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.43 vs. limit=22.5 2024-09-25 01:18:44,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=631003.3333333334, ans=0.0 2024-09-25 01:18:48,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=631003.3333333334, ans=0.125 2024-09-25 01:19:39,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=631190.0, ans=0.125 2024-09-25 01:19:56,335 INFO [train.py:1198] (3/4) Epoch 35, batch 2800, loss[loss=0.1532, ctc_loss=0.09763, cr_loss=0.2778, over 16654.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.1252, cr_loss=0.3414, over 3356452.24 frames. ], batch size: 37, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:20:23,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=631283.3333333334, ans=0.2 2024-09-25 01:20:31,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=631330.0, ans=0.125 2024-09-25 01:20:33,122 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.248e+02 1.324e+02 1.411e+02 2.015e+02, threshold=2.647e+02, percent-clipped=0.0 2024-09-25 01:20:42,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=631376.6666666666, ans=0.125 2024-09-25 01:20:53,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=631376.6666666666, ans=0.2 2024-09-25 01:20:55,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=631376.6666666666, ans=0.025 2024-09-25 01:20:57,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=631376.6666666666, ans=0.1 2024-09-25 01:21:00,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=631423.3333333334, ans=0.04949747468305833 2024-09-25 01:21:08,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=631423.3333333334, ans=0.125 2024-09-25 01:21:13,475 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 01:21:16,335 INFO [train.py:1198] (3/4) Epoch 35, batch 2850, loss[loss=0.1953, ctc_loss=0.1258, cr_loss=0.3475, over 17169.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1261, cr_loss=0.343, over 3354986.44 frames. ], batch size: 55, lr: 3.39e-03, grad_scale: 64.0 2024-09-25 01:21:42,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=631516.6666666666, ans=0.125 2024-09-25 01:21:42,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=631516.6666666666, ans=0.125 2024-09-25 01:21:42,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=631516.6666666666, ans=0.025 2024-09-25 01:21:54,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=631563.3333333334, ans=0.125 2024-09-25 01:21:55,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=631563.3333333334, ans=0.125 2024-09-25 01:22:31,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=631656.6666666666, ans=0.125 2024-09-25 01:22:39,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=631656.6666666666, ans=0.035 2024-09-25 01:22:41,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=631656.6666666666, ans=0.125 2024-09-25 01:22:46,463 INFO [train.py:1198] (3/4) Epoch 35, batch 2900, loss[loss=0.1995, ctc_loss=0.13, cr_loss=0.3474, over 17220.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1263, cr_loss=0.3442, over 3358966.25 frames. ], batch size: 50, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:22:54,733 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 01:23:21,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.97 vs. limit=15.0 2024-09-25 01:23:24,647 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.283e+02 1.348e+02 1.424e+02 1.926e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-25 01:23:54,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=631890.0, ans=0.0 2024-09-25 01:24:06,674 INFO [train.py:1198] (3/4) Epoch 35, batch 2950, loss[loss=0.212, ctc_loss=0.1372, cr_loss=0.3735, over 17044.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1258, cr_loss=0.3435, over 3370527.03 frames. ], batch size: 52, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:24:20,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=631936.6666666666, ans=0.04949747468305833 2024-09-25 01:24:28,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=15.0 2024-09-25 01:24:29,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=631983.3333333334, ans=0.1 2024-09-25 01:25:06,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=632076.6666666666, ans=0.0 2024-09-25 01:25:26,967 INFO [train.py:1198] (3/4) Epoch 35, batch 3000, loss[loss=0.1639, ctc_loss=0.102, cr_loss=0.3092, over 17108.00 frames. ], tot_loss[loss=0.1938, ctc_loss=0.1254, cr_loss=0.342, over 3366204.03 frames. ], batch size: 40, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:25:26,967 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 01:25:42,182 INFO [train.py:1230] (3/4) Epoch 35, validation: loss=0.03538, ctc_loss=0.03538, cr_loss=9.094e-15, over 944034.00 frames. 2024-09-25 01:25:42,183 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 01:26:01,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=632216.6666666666, ans=0.125 2024-09-25 01:26:19,534 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.295e+02 1.357e+02 1.448e+02 2.181e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-25 01:26:27,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=632310.0, ans=0.0 2024-09-25 01:26:52,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=632356.6666666666, ans=0.95 2024-09-25 01:27:02,792 INFO [train.py:1198] (3/4) Epoch 35, batch 3050, loss[loss=0.206, ctc_loss=0.1334, cr_loss=0.3627, over 17162.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.1251, cr_loss=0.3418, over 3363736.45 frames. ], batch size: 45, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:27:03,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=632403.3333333334, ans=0.125 2024-09-25 01:27:26,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.80 vs. limit=12.0 2024-09-25 01:28:08,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=632590.0, ans=0.025 2024-09-25 01:28:23,387 INFO [train.py:1198] (3/4) Epoch 35, batch 3100, loss[loss=0.1549, ctc_loss=0.09652, cr_loss=0.2919, over 16675.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1249, cr_loss=0.3416, over 3359780.17 frames. ], batch size: 37, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:28:53,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=12.0 2024-09-25 01:29:01,046 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.287e+02 1.342e+02 1.471e+02 1.803e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-25 01:29:23,342 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2024-09-25 01:29:24,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=632776.6666666666, ans=0.125 2024-09-25 01:29:24,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=632776.6666666666, ans=0.0 2024-09-25 01:29:33,904 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.20 vs. limit=15.0 2024-09-25 01:29:45,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=632870.0, ans=0.125 2024-09-25 01:29:46,586 INFO [train.py:1198] (3/4) Epoch 35, batch 3150, loss[loss=0.1967, ctc_loss=0.1273, cr_loss=0.3468, over 16878.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1256, cr_loss=0.342, over 3350844.67 frames. ], batch size: 58, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:29:53,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=22.5 2024-09-25 01:30:24,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=22.5 2024-09-25 01:30:27,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=632963.3333333334, ans=0.2 2024-09-25 01:30:51,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=633056.6666666666, ans=15.0 2024-09-25 01:30:54,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=12.0 2024-09-25 01:31:04,922 INFO [train.py:1198] (3/4) Epoch 35, batch 3200, loss[loss=0.2122, ctc_loss=0.1375, cr_loss=0.3735, over 17059.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.126, cr_loss=0.343, over 3353366.20 frames. ], batch size: 46, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:31:08,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=633103.3333333334, ans=0.0 2024-09-25 01:31:31,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=633150.0, ans=0.025 2024-09-25 01:31:42,383 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.282e+02 1.354e+02 1.513e+02 2.603e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-25 01:32:23,313 INFO [train.py:1198] (3/4) Epoch 35, batch 3250, loss[loss=0.2405, ctc_loss=0.1591, cr_loss=0.4068, over 17035.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1263, cr_loss=0.3436, over 3355284.17 frames. ], batch size: 52, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:33:05,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=633430.0, ans=0.125 2024-09-25 01:33:07,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=633430.0, ans=0.125 2024-09-25 01:33:10,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=633476.6666666666, ans=0.2 2024-09-25 01:33:11,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=633476.6666666666, ans=0.125 2024-09-25 01:33:23,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.43 vs. limit=10.0 2024-09-25 01:33:28,966 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 01:33:40,961 INFO [train.py:1198] (3/4) Epoch 35, batch 3300, loss[loss=0.2503, ctc_loss=0.1649, cr_loss=0.4271, over 16570.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.126, cr_loss=0.3434, over 3355960.17 frames. ], batch size: 66, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:33:49,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=633570.0, ans=0.0 2024-09-25 01:34:19,045 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.276e+02 1.335e+02 1.450e+02 3.347e+02, threshold=2.669e+02, percent-clipped=1.0 2024-09-25 01:34:19,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=633663.3333333334, ans=0.125 2024-09-25 01:34:22,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=633663.3333333334, ans=0.0 2024-09-25 01:34:38,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=633710.0, ans=0.1 2024-09-25 01:34:38,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=633710.0, ans=0.0 2024-09-25 01:34:39,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=633710.0, ans=0.0 2024-09-25 01:35:00,136 INFO [train.py:1198] (3/4) Epoch 35, batch 3350, loss[loss=0.1965, ctc_loss=0.1268, cr_loss=0.3483, over 17026.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1248, cr_loss=0.3411, over 3363876.87 frames. ], batch size: 56, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:35:00,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=633803.3333333334, ans=0.1 2024-09-25 01:35:16,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=633850.0, ans=0.1 2024-09-25 01:35:19,460 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 01:35:36,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=633896.6666666666, ans=0.0 2024-09-25 01:36:04,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=633990.0, ans=0.2 2024-09-25 01:36:06,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=633990.0, ans=0.125 2024-09-25 01:36:18,435 INFO [train.py:1198] (3/4) Epoch 35, batch 3400, loss[loss=0.1665, ctc_loss=0.107, cr_loss=0.2974, over 17094.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.126, cr_loss=0.3426, over 3354910.28 frames. ], batch size: 40, lr: 3.38e-03, grad_scale: 16.0 2024-09-25 01:36:53,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=634130.0, ans=0.125 2024-09-25 01:36:53,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=634130.0, ans=0.0 2024-09-25 01:36:57,299 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.280e+02 1.352e+02 1.458e+02 2.510e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-25 01:37:02,653 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.36 vs. limit=15.0 2024-09-25 01:37:09,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.53 vs. limit=15.0 2024-09-25 01:37:38,590 INFO [train.py:1198] (3/4) Epoch 35, batch 3450, loss[loss=0.1751, ctc_loss=0.1122, cr_loss=0.3143, over 17117.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1264, cr_loss=0.3433, over 3346773.96 frames. ], batch size: 40, lr: 3.38e-03, grad_scale: 16.0 2024-09-25 01:37:52,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=634316.6666666666, ans=0.125 2024-09-25 01:37:54,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=634316.6666666666, ans=0.125 2024-09-25 01:38:05,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2024-09-25 01:38:16,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=634363.3333333334, ans=0.0 2024-09-25 01:38:41,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=634456.6666666666, ans=0.125 2024-09-25 01:38:50,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0 2024-09-25 01:38:58,786 INFO [train.py:1198] (3/4) Epoch 35, batch 3500, loss[loss=0.2012, ctc_loss=0.1315, cr_loss=0.3483, over 17190.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.126, cr_loss=0.3427, over 3348263.52 frames. ], batch size: 55, lr: 3.38e-03, grad_scale: 16.0 2024-09-25 01:39:00,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=634503.3333333334, ans=0.0 2024-09-25 01:39:34,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=634596.6666666666, ans=0.025 2024-09-25 01:39:40,109 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.254e+02 1.331e+02 1.459e+02 2.382e+02, threshold=2.663e+02, percent-clipped=0.0 2024-09-25 01:39:48,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=634643.3333333334, ans=0.125 2024-09-25 01:39:53,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.55 vs. limit=22.5 2024-09-25 01:39:56,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=634643.3333333334, ans=0.2 2024-09-25 01:40:00,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2024-09-25 01:40:04,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=634643.3333333334, ans=0.0 2024-09-25 01:40:22,748 INFO [train.py:1198] (3/4) Epoch 35, batch 3550, loss[loss=0.1875, ctc_loss=0.1201, cr_loss=0.337, over 17202.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1255, cr_loss=0.342, over 3344949.12 frames. ], batch size: 50, lr: 3.38e-03, grad_scale: 16.0 2024-09-25 01:40:26,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=634736.6666666666, ans=0.125 2024-09-25 01:40:38,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.36 vs. limit=22.5 2024-09-25 01:40:52,538 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 01:41:10,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.71 vs. limit=15.0 2024-09-25 01:41:11,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=634876.6666666666, ans=0.1 2024-09-25 01:41:20,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=634876.6666666666, ans=0.2 2024-09-25 01:41:40,837 INFO [train.py:1198] (3/4) Epoch 35, batch 3600, loss[loss=0.197, ctc_loss=0.1275, cr_loss=0.3476, over 17308.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.126, cr_loss=0.3425, over 3338923.08 frames. ], batch size: 49, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:42:06,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=635016.6666666666, ans=0.0 2024-09-25 01:42:06,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=635016.6666666666, ans=0.0 2024-09-25 01:42:19,817 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.292e+02 1.369e+02 1.496e+02 2.107e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-25 01:42:23,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=635063.3333333334, ans=0.125 2024-09-25 01:42:26,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.01 vs. limit=10.0 2024-09-25 01:42:58,847 INFO [train.py:1198] (3/4) Epoch 35, batch 3650, loss[loss=0.2256, ctc_loss=0.1478, cr_loss=0.3894, over 17006.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1261, cr_loss=0.3424, over 3352155.20 frames. ], batch size: 53, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:43:08,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=635203.3333333334, ans=0.5 2024-09-25 01:43:13,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=635250.0, ans=0.125 2024-09-25 01:43:13,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=635250.0, ans=0.125 2024-09-25 01:43:25,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=635250.0, ans=0.125 2024-09-25 01:43:33,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=635296.6666666666, ans=0.0 2024-09-25 01:43:45,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=635343.3333333334, ans=0.025 2024-09-25 01:43:47,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=635343.3333333334, ans=0.0 2024-09-25 01:43:50,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=635343.3333333334, ans=0.09899494936611666 2024-09-25 01:43:52,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-09-25 01:44:17,734 INFO [train.py:1198] (3/4) Epoch 35, batch 3700, loss[loss=0.1832, ctc_loss=0.1159, cr_loss=0.3366, over 17288.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1263, cr_loss=0.343, over 3356162.78 frames. ], batch size: 46, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:44:30,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=635436.6666666666, ans=0.0 2024-09-25 01:44:35,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=635483.3333333334, ans=0.125 2024-09-25 01:44:38,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=635483.3333333334, ans=0.0 2024-09-25 01:44:56,877 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.290e+02 1.367e+02 1.473e+02 2.318e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-25 01:45:30,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=635623.3333333334, ans=0.125 2024-09-25 01:45:36,674 INFO [train.py:1198] (3/4) Epoch 35, batch 3750, loss[loss=0.1604, ctc_loss=0.101, cr_loss=0.297, over 17180.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1265, cr_loss=0.3431, over 3347397.72 frames. ], batch size: 41, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:45:36,984 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 01:45:40,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=635670.0, ans=0.125 2024-09-25 01:46:03,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=635716.6666666666, ans=0.025 2024-09-25 01:46:20,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-25 01:46:55,873 INFO [train.py:1198] (3/4) Epoch 35, batch 3800, loss[loss=0.2174, ctc_loss=0.1427, cr_loss=0.3736, over 16725.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1271, cr_loss=0.3439, over 3343641.53 frames. ], batch size: 61, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:47:34,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=635996.6666666666, ans=0.07 2024-09-25 01:47:35,580 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.304e+02 1.388e+02 1.504e+02 2.196e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-25 01:47:43,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=636043.3333333334, ans=0.125 2024-09-25 01:47:55,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-09-25 01:48:09,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=636090.0, ans=0.125 2024-09-25 01:48:16,222 INFO [train.py:1198] (3/4) Epoch 35, batch 3850, loss[loss=0.2111, ctc_loss=0.144, cr_loss=0.3358, over 12188.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1272, cr_loss=0.3421, over 3277139.45 frames. ], batch size: 123, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:48:35,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=22.5 2024-09-25 01:48:39,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=636183.3333333334, ans=0.0 2024-09-25 01:48:58,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=636230.0, ans=0.025 2024-09-25 01:49:15,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=636276.6666666666, ans=0.125 2024-09-25 01:49:21,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=636323.3333333334, ans=0.125 2024-09-25 01:50:19,162 INFO [train.py:1198] (3/4) Epoch 36, batch 0, loss[loss=0.2111, ctc_loss=0.1377, cr_loss=0.367, over 16700.00 frames. ], tot_loss[loss=0.2111, ctc_loss=0.1377, cr_loss=0.367, over 16700.00 frames. ], batch size: 61, lr: 3.33e-03, grad_scale: 32.0 2024-09-25 01:50:19,162 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 01:50:34,827 INFO [train.py:1230] (3/4) Epoch 36, validation: loss=0.0356, ctc_loss=0.0356, cr_loss=9.615e-15, over 944034.00 frames. 2024-09-25 01:50:34,828 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 01:50:35,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=636351.3333333334, ans=0.125 2024-09-25 01:50:40,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=636351.3333333334, ans=0.1 2024-09-25 01:50:41,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=636351.3333333334, ans=0.0 2024-09-25 01:50:49,694 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 01:51:21,246 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.459e+02 1.589e+02 1.794e+02 2.988e+02, threshold=3.179e+02, percent-clipped=1.0 2024-09-25 01:51:51,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=636538.0, ans=0.0 2024-09-25 01:51:55,584 INFO [train.py:1198] (3/4) Epoch 36, batch 50, loss[loss=0.2041, ctc_loss=0.1322, cr_loss=0.3595, over 17215.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.1283, cr_loss=0.3483, over 754495.21 frames. ], batch size: 47, lr: 3.33e-03, grad_scale: 32.0 2024-09-25 01:51:57,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=636584.6666666666, ans=0.0 2024-09-25 01:52:07,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=636584.6666666666, ans=0.0 2024-09-25 01:52:38,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=636678.0, ans=0.025 2024-09-25 01:52:58,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2024-09-25 01:53:06,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=636771.3333333334, ans=12.0 2024-09-25 01:53:10,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=636771.3333333334, ans=0.125 2024-09-25 01:53:21,607 INFO [train.py:1198] (3/4) Epoch 36, batch 100, loss[loss=0.1882, ctc_loss=0.1201, cr_loss=0.3405, over 17284.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.1252, cr_loss=0.3416, over 1332276.13 frames. ], batch size: 44, lr: 3.33e-03, grad_scale: 32.0 2024-09-25 01:53:29,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=636818.0, ans=0.0 2024-09-25 01:53:44,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=636864.6666666666, ans=0.125 2024-09-25 01:54:11,062 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.276e+02 1.354e+02 1.441e+02 1.843e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-25 01:54:14,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=636958.0, ans=0.2 2024-09-25 01:54:30,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=637004.6666666666, ans=0.04949747468305833 2024-09-25 01:54:32,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=637004.6666666666, ans=0.125 2024-09-25 01:54:46,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=637051.3333333334, ans=0.0 2024-09-25 01:54:47,638 INFO [train.py:1198] (3/4) Epoch 36, batch 150, loss[loss=0.1848, ctc_loss=0.1184, cr_loss=0.3322, over 17222.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.123, cr_loss=0.3385, over 1787015.65 frames. ], batch size: 47, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 01:55:06,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=637098.0, ans=0.025 2024-09-25 01:55:50,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2024-09-25 01:56:02,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=637238.0, ans=0.1 2024-09-25 01:56:07,386 INFO [train.py:1198] (3/4) Epoch 36, batch 200, loss[loss=0.2101, ctc_loss=0.1372, cr_loss=0.3647, over 17143.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.124, cr_loss=0.3401, over 2133258.23 frames. ], batch size: 48, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 01:56:18,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=637284.6666666666, ans=0.1 2024-09-25 01:56:55,270 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.279e+02 1.373e+02 1.478e+02 2.081e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-25 01:57:29,960 INFO [train.py:1198] (3/4) Epoch 36, batch 250, loss[loss=0.2025, ctc_loss=0.1298, cr_loss=0.364, over 17308.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1242, cr_loss=0.3405, over 2398837.18 frames. ], batch size: 49, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 01:57:32,243 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2024-09-25 01:57:45,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2024-09-25 01:57:59,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=637564.6666666666, ans=0.125 2024-09-25 01:58:14,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=637611.3333333334, ans=0.1 2024-09-25 01:58:28,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.02 vs. limit=15.0 2024-09-25 01:58:33,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=637658.0, ans=0.1 2024-09-25 01:58:52,446 INFO [train.py:1198] (3/4) Epoch 36, batch 300, loss[loss=0.151, ctc_loss=0.09659, cr_loss=0.2721, over 17011.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1256, cr_loss=0.3428, over 2599661.13 frames. ], batch size: 44, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 01:59:05,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2024-09-25 01:59:26,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=637844.6666666666, ans=0.0 2024-09-25 01:59:35,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=637844.6666666666, ans=0.0 2024-09-25 01:59:46,603 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.272e+02 1.363e+02 1.432e+02 1.912e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-25 01:59:48,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=637891.3333333334, ans=0.125 2024-09-25 01:59:49,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=637891.3333333334, ans=0.125 2024-09-25 02:00:06,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=637938.0, ans=0.125 2024-09-25 02:00:10,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=637938.0, ans=0.0 2024-09-25 02:00:18,411 INFO [train.py:1198] (3/4) Epoch 36, batch 350, loss[loss=0.2516, ctc_loss=0.1692, cr_loss=0.4123, over 12077.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1259, cr_loss=0.3434, over 2764003.28 frames. ], batch size: 123, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 02:00:26,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=637984.6666666666, ans=0.0 2024-09-25 02:01:31,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=638171.3333333334, ans=0.09899494936611666 2024-09-25 02:01:31,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2024-09-25 02:01:37,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2024-09-25 02:01:38,835 INFO [train.py:1198] (3/4) Epoch 36, batch 400, loss[loss=0.231, ctc_loss=0.1551, cr_loss=0.3793, over 15058.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.126, cr_loss=0.3429, over 2889074.75 frames. ], batch size: 89, lr: 3.32e-03, grad_scale: 32.0 2024-09-25 02:01:56,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=638264.6666666666, ans=0.0 2024-09-25 02:02:07,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=638264.6666666666, ans=0.125 2024-09-25 02:02:16,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=638311.3333333334, ans=0.95 2024-09-25 02:02:29,165 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.293e+02 1.367e+02 1.475e+02 2.656e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-25 02:02:38,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.41 vs. limit=10.0 2024-09-25 02:02:50,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=22.5 2024-09-25 02:03:01,278 INFO [train.py:1198] (3/4) Epoch 36, batch 450, loss[loss=0.1997, ctc_loss=0.1294, cr_loss=0.3514, over 16514.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.126, cr_loss=0.3433, over 2994435.41 frames. ], batch size: 66, lr: 3.32e-03, grad_scale: 32.0 2024-09-25 02:03:27,506 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.53 vs. limit=22.5 2024-09-25 02:03:28,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=638498.0, ans=0.2 2024-09-25 02:03:51,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=638591.3333333334, ans=0.5 2024-09-25 02:04:06,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=638591.3333333334, ans=0.1 2024-09-25 02:04:16,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=638638.0, ans=0.02 2024-09-25 02:04:27,074 INFO [train.py:1198] (3/4) Epoch 36, batch 500, loss[loss=0.1928, ctc_loss=0.1256, cr_loss=0.3361, over 17007.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1261, cr_loss=0.3432, over 3065163.93 frames. ], batch size: 44, lr: 3.32e-03, grad_scale: 32.0 2024-09-25 02:04:41,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=638684.6666666666, ans=0.0 2024-09-25 02:04:47,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=638731.3333333334, ans=0.0 2024-09-25 02:05:17,142 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2024-09-25 02:05:18,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638824.6666666666, ans=0.1 2024-09-25 02:05:19,276 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.244e+02 1.315e+02 1.459e+02 2.015e+02, threshold=2.630e+02, percent-clipped=0.0 2024-09-25 02:05:24,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=638824.6666666666, ans=0.2 2024-09-25 02:05:29,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=638824.6666666666, ans=0.2 2024-09-25 02:05:34,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=638871.3333333334, ans=0.0 2024-09-25 02:05:49,779 INFO [train.py:1198] (3/4) Epoch 36, batch 550, loss[loss=0.2146, ctc_loss=0.1439, cr_loss=0.3538, over 15054.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1261, cr_loss=0.344, over 3128970.88 frames. ], batch size: 89, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 02:05:54,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=638918.0, ans=0.1 2024-09-25 02:06:09,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=638964.6666666666, ans=0.035 2024-09-25 02:06:17,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=638964.6666666666, ans=0.125 2024-09-25 02:06:21,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=639011.3333333334, ans=0.0 2024-09-25 02:06:26,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=639011.3333333334, ans=0.125 2024-09-25 02:07:09,442 INFO [train.py:1198] (3/4) Epoch 36, batch 600, loss[loss=0.1959, ctc_loss=0.1249, cr_loss=0.3552, over 17325.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1263, cr_loss=0.344, over 3181562.80 frames. ], batch size: 51, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 02:07:42,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639244.6666666666, ans=0.1 2024-09-25 02:07:45,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=22.5 2024-09-25 02:07:49,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2024-09-25 02:07:56,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.83 vs. limit=22.5 2024-09-25 02:07:57,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=639244.6666666666, ans=0.125 2024-09-25 02:08:01,748 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.285e+02 1.361e+02 1.464e+02 2.442e+02, threshold=2.722e+02, percent-clipped=0.0 2024-09-25 02:08:07,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=639291.3333333334, ans=0.1 2024-09-25 02:08:07,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=639291.3333333334, ans=0.125 2024-09-25 02:08:25,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=639338.0, ans=0.2 2024-09-25 02:08:34,771 INFO [train.py:1198] (3/4) Epoch 36, batch 650, loss[loss=0.2267, ctc_loss=0.1493, cr_loss=0.3872, over 17043.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.126, cr_loss=0.3441, over 3214873.36 frames. ], batch size: 52, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 02:08:53,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=639431.3333333334, ans=0.125 2024-09-25 02:09:03,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639431.3333333334, ans=0.1 2024-09-25 02:09:04,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=639431.3333333334, ans=0.125 2024-09-25 02:09:09,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=639478.0, ans=0.125 2024-09-25 02:09:22,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2024-09-25 02:09:36,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=639524.6666666666, ans=0.0 2024-09-25 02:09:36,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.81 vs. limit=15.0 2024-09-25 02:09:39,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=639524.6666666666, ans=0.125 2024-09-25 02:09:41,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-09-25 02:09:59,840 INFO [train.py:1198] (3/4) Epoch 36, batch 700, loss[loss=0.1941, ctc_loss=0.1257, cr_loss=0.342, over 17070.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1259, cr_loss=0.3431, over 3243232.83 frames. ], batch size: 46, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 02:10:11,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=639618.0, ans=0.5 2024-09-25 02:10:13,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=639618.0, ans=0.1 2024-09-25 02:10:14,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=639664.6666666666, ans=0.125 2024-09-25 02:10:18,217 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-09-25 02:10:30,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639711.3333333334, ans=0.1 2024-09-25 02:10:37,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=639711.3333333334, ans=0.0 2024-09-25 02:10:49,613 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.257e+02 1.341e+02 1.454e+02 1.758e+02, threshold=2.682e+02, percent-clipped=0.0 2024-09-25 02:11:04,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2024-09-25 02:11:19,990 INFO [train.py:1198] (3/4) Epoch 36, batch 750, loss[loss=0.1855, ctc_loss=0.1178, cr_loss=0.3388, over 17288.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1255, cr_loss=0.3431, over 3265683.89 frames. ], batch size: 46, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 02:11:41,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=639898.0, ans=0.125 2024-09-25 02:11:42,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=639898.0, ans=0.0 2024-09-25 02:11:46,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=639898.0, ans=0.95 2024-09-25 02:11:58,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=639944.6666666666, ans=0.0 2024-09-25 02:12:42,690 INFO [train.py:1198] (3/4) Epoch 36, batch 800, loss[loss=0.2085, ctc_loss=0.1372, cr_loss=0.3563, over 16909.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.126, cr_loss=0.3434, over 3281259.69 frames. ], batch size: 58, lr: 3.32e-03, grad_scale: 32.0 2024-09-25 02:12:54,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=640084.6666666666, ans=0.2 2024-09-25 02:13:16,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=640178.0, ans=10.0 2024-09-25 02:13:16,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=640178.0, ans=0.125 2024-09-25 02:13:17,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=640178.0, ans=0.0 2024-09-25 02:13:24,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=640178.0, ans=0.125 2024-09-25 02:13:34,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=640224.6666666666, ans=0.025 2024-09-25 02:13:35,135 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.278e+02 1.351e+02 1.457e+02 1.942e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-25 02:13:41,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=640224.6666666666, ans=0.025 2024-09-25 02:13:54,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=640271.3333333334, ans=0.125 2024-09-25 02:14:02,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0 2024-09-25 02:14:04,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.74 vs. limit=15.0 2024-09-25 02:14:08,467 INFO [train.py:1198] (3/4) Epoch 36, batch 850, loss[loss=0.1885, ctc_loss=0.1231, cr_loss=0.327, over 17032.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1262, cr_loss=0.3439, over 3301608.92 frames. ], batch size: 44, lr: 3.32e-03, grad_scale: 32.0 2024-09-25 02:14:36,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=640364.6666666666, ans=0.0 2024-09-25 02:15:09,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.02 vs. limit=15.0 2024-09-25 02:15:18,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=640504.6666666666, ans=0.125 2024-09-25 02:15:30,657 INFO [train.py:1198] (3/4) Epoch 36, batch 900, loss[loss=0.1968, ctc_loss=0.1285, cr_loss=0.3411, over 16969.00 frames. ], tot_loss[loss=0.1961, ctc_loss=0.127, cr_loss=0.3451, over 3302453.27 frames. ], batch size: 56, lr: 3.32e-03, grad_scale: 32.0 2024-09-25 02:16:16,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=640644.6666666666, ans=0.125 2024-09-25 02:16:22,098 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.267e+02 1.323e+02 1.416e+02 1.789e+02, threshold=2.647e+02, percent-clipped=0.0 2024-09-25 02:16:51,067 INFO [train.py:1198] (3/4) Epoch 36, batch 950, loss[loss=0.213, ctc_loss=0.1398, cr_loss=0.366, over 16811.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1271, cr_loss=0.346, over 3325754.72 frames. ], batch size: 61, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 02:17:01,247 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 02:17:21,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=640831.3333333334, ans=0.125 2024-09-25 02:17:24,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=640878.0, ans=0.125 2024-09-25 02:17:42,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=640924.6666666666, ans=0.1 2024-09-25 02:17:45,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=22.5 2024-09-25 02:17:48,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=640924.6666666666, ans=0.125 2024-09-25 02:18:14,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=641018.0, ans=0.0 2024-09-25 02:18:16,110 INFO [train.py:1198] (3/4) Epoch 36, batch 1000, loss[loss=0.1934, ctc_loss=0.128, cr_loss=0.3266, over 16025.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1273, cr_loss=0.346, over 3330850.12 frames. ], batch size: 74, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:18:26,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641018.0, ans=0.1 2024-09-25 02:18:46,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2024-09-25 02:19:04,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=22.5 2024-09-25 02:19:10,045 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.268e+02 1.360e+02 1.441e+02 1.989e+02, threshold=2.720e+02, percent-clipped=0.0 2024-09-25 02:19:17,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=641158.0, ans=0.0 2024-09-25 02:19:33,267 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.63 vs. limit=12.0 2024-09-25 02:19:41,707 INFO [train.py:1198] (3/4) Epoch 36, batch 1050, loss[loss=0.1738, ctc_loss=0.1134, cr_loss=0.3015, over 17178.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1262, cr_loss=0.3436, over 3334167.66 frames. ], batch size: 41, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:20:04,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=641298.0, ans=0.125 2024-09-25 02:20:13,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=641344.6666666666, ans=0.2 2024-09-25 02:20:23,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=641344.6666666666, ans=0.1 2024-09-25 02:20:41,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=641391.3333333334, ans=0.2 2024-09-25 02:21:01,839 INFO [train.py:1198] (3/4) Epoch 36, batch 1100, loss[loss=0.2053, ctc_loss=0.1332, cr_loss=0.3606, over 16717.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1255, cr_loss=0.3427, over 3340650.14 frames. ], batch size: 61, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:21:25,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=641531.3333333334, ans=0.0 2024-09-25 02:21:30,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=641531.3333333334, ans=0.025 2024-09-25 02:21:30,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2024-09-25 02:21:34,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=641578.0, ans=15.0 2024-09-25 02:21:43,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=641578.0, ans=0.125 2024-09-25 02:21:52,920 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.290e+02 1.394e+02 1.498e+02 2.022e+02, threshold=2.788e+02, percent-clipped=0.0 2024-09-25 02:21:53,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=641624.6666666666, ans=0.125 2024-09-25 02:22:01,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=641624.6666666666, ans=0.125 2024-09-25 02:22:24,256 INFO [train.py:1198] (3/4) Epoch 36, batch 1150, loss[loss=0.2063, ctc_loss=0.1358, cr_loss=0.353, over 16043.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1262, cr_loss=0.3436, over 3335193.10 frames. ], batch size: 74, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:22:40,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=641764.6666666666, ans=0.125 2024-09-25 02:23:28,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641858.0, ans=0.1 2024-09-25 02:23:31,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=641904.6666666666, ans=0.0 2024-09-25 02:23:49,552 INFO [train.py:1198] (3/4) Epoch 36, batch 1200, loss[loss=0.1735, ctc_loss=0.1097, cr_loss=0.3186, over 16311.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1267, cr_loss=0.3442, over 3325709.51 frames. ], batch size: 36, lr: 3.31e-03, grad_scale: 32.0 2024-09-25 02:24:25,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=642044.6666666666, ans=0.0 2024-09-25 02:24:43,286 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.264e+02 1.352e+02 1.439e+02 3.839e+02, threshold=2.704e+02, percent-clipped=1.0 2024-09-25 02:25:12,609 INFO [train.py:1198] (3/4) Epoch 36, batch 1250, loss[loss=0.2016, ctc_loss=0.1311, cr_loss=0.3525, over 17099.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1266, cr_loss=0.3444, over 3339147.54 frames. ], batch size: 49, lr: 3.31e-03, grad_scale: 32.0 2024-09-25 02:25:17,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=642184.6666666666, ans=0.2 2024-09-25 02:25:35,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=642231.3333333334, ans=0.1 2024-09-25 02:25:55,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=642278.0, ans=0.1 2024-09-25 02:25:59,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-25 02:26:01,291 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-09-25 02:26:03,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=642324.6666666666, ans=0.125 2024-09-25 02:26:32,184 INFO [train.py:1198] (3/4) Epoch 36, batch 1300, loss[loss=0.1662, ctc_loss=0.1067, cr_loss=0.2977, over 17082.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1264, cr_loss=0.344, over 3338438.26 frames. ], batch size: 40, lr: 3.31e-03, grad_scale: 32.0 2024-09-25 02:26:40,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=642418.0, ans=0.2 2024-09-25 02:27:03,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=642464.6666666666, ans=0.125 2024-09-25 02:27:08,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=642511.3333333334, ans=0.125 2024-09-25 02:27:18,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=642511.3333333334, ans=0.125 2024-09-25 02:27:18,693 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.46 vs. limit=15.0 2024-09-25 02:27:19,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=642511.3333333334, ans=0.125 2024-09-25 02:27:22,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=642558.0, ans=0.0 2024-09-25 02:27:25,952 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.281e+02 1.359e+02 1.510e+02 2.229e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 02:27:35,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=642558.0, ans=0.0 2024-09-25 02:27:40,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=642604.6666666666, ans=0.2 2024-09-25 02:27:56,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=642651.3333333334, ans=0.1 2024-09-25 02:27:57,337 INFO [train.py:1198] (3/4) Epoch 36, batch 1350, loss[loss=0.1871, ctc_loss=0.1224, cr_loss=0.3237, over 17278.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1262, cr_loss=0.3437, over 3343336.80 frames. ], batch size: 42, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:28:34,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=22.5 2024-09-25 02:28:40,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=642744.6666666666, ans=0.1 2024-09-25 02:28:54,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=642791.3333333334, ans=0.2 2024-09-25 02:29:21,997 INFO [train.py:1198] (3/4) Epoch 36, batch 1400, loss[loss=0.2133, ctc_loss=0.1404, cr_loss=0.3642, over 16702.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1265, cr_loss=0.3446, over 3338893.24 frames. ], batch size: 61, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:29:59,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=642978.0, ans=0.125 2024-09-25 02:30:08,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=643024.6666666666, ans=0.1 2024-09-25 02:30:14,813 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.323e+02 1.435e+02 1.580e+02 6.083e+02, threshold=2.870e+02, percent-clipped=1.0 2024-09-25 02:30:15,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=643024.6666666666, ans=0.125 2024-09-25 02:30:23,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=643024.6666666666, ans=0.0 2024-09-25 02:30:25,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=643071.3333333334, ans=0.2 2024-09-25 02:30:37,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=643071.3333333334, ans=0.5 2024-09-25 02:30:42,402 INFO [train.py:1198] (3/4) Epoch 36, batch 1450, loss[loss=0.2029, ctc_loss=0.1325, cr_loss=0.3523, over 15957.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1266, cr_loss=0.3447, over 3335114.94 frames. ], batch size: 74, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:30:44,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=643118.0, ans=0.125 2024-09-25 02:30:54,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=643118.0, ans=0.2 2024-09-25 02:31:08,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=643164.6666666666, ans=0.2 2024-09-25 02:31:35,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=643258.0, ans=0.125 2024-09-25 02:31:37,040 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 02:31:49,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=643304.6666666666, ans=0.125 2024-09-25 02:32:04,904 INFO [train.py:1198] (3/4) Epoch 36, batch 1500, loss[loss=0.1732, ctc_loss=0.11, cr_loss=0.316, over 17166.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.126, cr_loss=0.3438, over 3348617.99 frames. ], batch size: 41, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:32:13,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=643351.3333333334, ans=0.125 2024-09-25 02:33:00,746 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.291e+02 1.396e+02 1.518e+02 2.099e+02, threshold=2.792e+02, percent-clipped=0.0 2024-09-25 02:33:16,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2024-09-25 02:33:23,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=643538.0, ans=0.2 2024-09-25 02:33:30,539 INFO [train.py:1198] (3/4) Epoch 36, batch 1550, loss[loss=0.19, ctc_loss=0.1216, cr_loss=0.3422, over 17023.00 frames. ], tot_loss[loss=0.1937, ctc_loss=0.1253, cr_loss=0.342, over 3347514.69 frames. ], batch size: 44, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:33:31,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.02 vs. limit=10.0 2024-09-25 02:33:34,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=643584.6666666666, ans=0.125 2024-09-25 02:33:43,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=643584.6666666666, ans=0.125 2024-09-25 02:33:46,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=643631.3333333334, ans=0.125 2024-09-25 02:34:09,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-09-25 02:34:15,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=643678.0, ans=15.0 2024-09-25 02:34:28,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2024-09-25 02:34:39,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=643771.3333333334, ans=0.125 2024-09-25 02:34:42,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=643771.3333333334, ans=0.2 2024-09-25 02:34:42,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643771.3333333334, ans=0.1 2024-09-25 02:34:53,200 INFO [train.py:1198] (3/4) Epoch 36, batch 1600, loss[loss=0.1538, ctc_loss=0.09733, cr_loss=0.2822, over 17030.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1245, cr_loss=0.3407, over 3357369.18 frames. ], batch size: 39, lr: 3.31e-03, grad_scale: 32.0 2024-09-25 02:35:19,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=643864.6666666666, ans=0.125 2024-09-25 02:35:46,526 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.268e+02 1.352e+02 1.455e+02 2.374e+02, threshold=2.704e+02, percent-clipped=0.0 2024-09-25 02:35:57,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2024-09-25 02:36:14,436 INFO [train.py:1198] (3/4) Epoch 36, batch 1650, loss[loss=0.1784, ctc_loss=0.1135, cr_loss=0.3245, over 17009.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1242, cr_loss=0.3402, over 3371223.57 frames. ], batch size: 44, lr: 3.31e-03, grad_scale: 32.0 2024-09-25 02:36:19,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=644051.3333333334, ans=0.0 2024-09-25 02:36:20,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=644051.3333333334, ans=0.2 2024-09-25 02:36:28,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=644098.0, ans=0.0 2024-09-25 02:36:32,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=22.5 2024-09-25 02:36:38,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=644098.0, ans=0.0 2024-09-25 02:36:48,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=644144.6666666666, ans=0.0 2024-09-25 02:36:57,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=644144.6666666666, ans=0.0 2024-09-25 02:37:37,128 INFO [train.py:1198] (3/4) Epoch 36, batch 1700, loss[loss=0.1952, ctc_loss=0.1266, cr_loss=0.3431, over 17308.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1238, cr_loss=0.3391, over 3372854.70 frames. ], batch size: 46, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:37:37,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=644284.6666666666, ans=0.2 2024-09-25 02:37:41,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=644284.6666666666, ans=0.2 2024-09-25 02:37:47,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=644284.6666666666, ans=0.2 2024-09-25 02:37:49,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=644284.6666666666, ans=0.125 2024-09-25 02:37:51,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=644284.6666666666, ans=0.07 2024-09-25 02:38:15,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=644378.0, ans=0.125 2024-09-25 02:38:37,027 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.256e+02 1.337e+02 1.426e+02 1.735e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-25 02:38:38,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=644424.6666666666, ans=0.0 2024-09-25 02:38:42,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2024-09-25 02:39:02,786 INFO [train.py:1198] (3/4) Epoch 36, batch 1750, loss[loss=0.1964, ctc_loss=0.1285, cr_loss=0.3394, over 17286.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1242, cr_loss=0.3403, over 3377138.19 frames. ], batch size: 46, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:39:12,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2024-09-25 02:39:21,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=644564.6666666666, ans=0.0 2024-09-25 02:39:53,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=644658.0, ans=0.125 2024-09-25 02:40:09,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=644704.6666666666, ans=0.1 2024-09-25 02:40:09,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=644704.6666666666, ans=0.1 2024-09-25 02:40:25,301 INFO [train.py:1198] (3/4) Epoch 36, batch 1800, loss[loss=0.1826, ctc_loss=0.1161, cr_loss=0.3323, over 17217.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1249, cr_loss=0.3415, over 3371627.71 frames. ], batch size: 47, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 02:40:41,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2024-09-25 02:40:54,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=644798.0, ans=0.125 2024-09-25 02:41:02,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644844.6666666666, ans=0.1 2024-09-25 02:41:07,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-09-25 02:41:15,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=644891.3333333334, ans=0.125 2024-09-25 02:41:19,905 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.269e+02 1.345e+02 1.435e+02 2.144e+02, threshold=2.691e+02, percent-clipped=0.0 2024-09-25 02:41:45,754 INFO [train.py:1198] (3/4) Epoch 36, batch 1850, loss[loss=0.2176, ctc_loss=0.1429, cr_loss=0.3734, over 17021.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1255, cr_loss=0.3427, over 3374182.69 frames. ], batch size: 56, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 02:42:15,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=645031.3333333334, ans=0.0 2024-09-25 02:42:43,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=645124.6666666666, ans=10.0 2024-09-25 02:43:10,549 INFO [train.py:1198] (3/4) Epoch 36, batch 1900, loss[loss=0.1878, ctc_loss=0.1201, cr_loss=0.3385, over 17346.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1253, cr_loss=0.3432, over 3371944.10 frames. ], batch size: 48, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 02:44:10,239 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.279e+02 1.358e+02 1.455e+02 1.986e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-25 02:44:12,701 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.54 vs. limit=10.0 2024-09-25 02:44:28,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=645404.6666666666, ans=0.0 2024-09-25 02:44:30,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2024-09-25 02:44:36,300 INFO [train.py:1198] (3/4) Epoch 36, batch 1950, loss[loss=0.2103, ctc_loss=0.1372, cr_loss=0.3655, over 16938.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1257, cr_loss=0.3444, over 3374660.15 frames. ], batch size: 58, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 02:44:41,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=645451.3333333334, ans=0.125 2024-09-25 02:44:42,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=645451.3333333334, ans=0.125 2024-09-25 02:44:52,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=645498.0, ans=0.0 2024-09-25 02:45:03,867 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 02:45:21,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=645544.6666666666, ans=0.125 2024-09-25 02:45:21,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=645544.6666666666, ans=0.0 2024-09-25 02:45:33,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=645591.3333333334, ans=0.0 2024-09-25 02:45:44,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=22.5 2024-09-25 02:45:48,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=645638.0, ans=0.125 2024-09-25 02:45:56,175 INFO [train.py:1198] (3/4) Epoch 36, batch 2000, loss[loss=0.1767, ctc_loss=0.1134, cr_loss=0.3161, over 17356.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1257, cr_loss=0.3441, over 3367548.41 frames. ], batch size: 48, lr: 3.30e-03, grad_scale: 32.0 2024-09-25 02:45:58,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=645684.6666666666, ans=0.0 2024-09-25 02:46:07,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=645684.6666666666, ans=0.1 2024-09-25 02:46:11,054 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=22.5 2024-09-25 02:46:18,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=645731.3333333334, ans=0.0 2024-09-25 02:46:28,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=645778.0, ans=0.09899494936611666 2024-09-25 02:46:47,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=645824.6666666666, ans=0.0 2024-09-25 02:46:50,160 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.284e+02 1.382e+02 1.479e+02 3.122e+02, threshold=2.765e+02, percent-clipped=1.0 2024-09-25 02:47:18,718 INFO [train.py:1198] (3/4) Epoch 36, batch 2050, loss[loss=0.2016, ctc_loss=0.129, cr_loss=0.3631, over 17316.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1261, cr_loss=0.3448, over 3357609.08 frames. ], batch size: 51, lr: 3.30e-03, grad_scale: 32.0 2024-09-25 02:47:40,046 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.54 vs. limit=22.5 2024-09-25 02:48:41,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.95 vs. limit=22.5 2024-09-25 02:48:43,713 INFO [train.py:1198] (3/4) Epoch 36, batch 2100, loss[loss=0.2088, ctc_loss=0.1355, cr_loss=0.366, over 17028.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1268, cr_loss=0.3453, over 3353423.04 frames. ], batch size: 53, lr: 3.30e-03, grad_scale: 32.0 2024-09-25 02:48:47,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2024-09-25 02:48:55,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=646151.3333333334, ans=0.2 2024-09-25 02:49:02,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=646198.0, ans=0.125 2024-09-25 02:49:02,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=646198.0, ans=0.125 2024-09-25 02:49:40,606 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.268e+02 1.352e+02 1.474e+02 3.330e+02, threshold=2.705e+02, percent-clipped=1.0 2024-09-25 02:49:47,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=646291.3333333334, ans=0.0 2024-09-25 02:49:47,786 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=15.0 2024-09-25 02:49:52,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.34 vs. limit=15.0 2024-09-25 02:49:56,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=646338.0, ans=0.125 2024-09-25 02:50:04,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=646384.6666666666, ans=0.1 2024-09-25 02:50:05,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2024-09-25 02:50:06,070 INFO [train.py:1198] (3/4) Epoch 36, batch 2150, loss[loss=0.1604, ctc_loss=0.1009, cr_loss=0.2977, over 17205.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1263, cr_loss=0.3442, over 3353194.17 frames. ], batch size: 41, lr: 3.30e-03, grad_scale: 32.0 2024-09-25 02:50:09,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=646384.6666666666, ans=0.125 2024-09-25 02:50:14,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=646384.6666666666, ans=0.125 2024-09-25 02:50:19,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=646384.6666666666, ans=0.0 2024-09-25 02:50:22,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=646431.3333333334, ans=0.0 2024-09-25 02:50:43,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2024-09-25 02:51:02,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=22.5 2024-09-25 02:51:06,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2024-09-25 02:51:25,843 INFO [train.py:1198] (3/4) Epoch 36, batch 2200, loss[loss=0.2234, ctc_loss=0.1462, cr_loss=0.3863, over 17014.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.1258, cr_loss=0.3431, over 3360013.90 frames. ], batch size: 53, lr: 3.30e-03, grad_scale: 32.0 2024-09-25 02:51:43,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=646664.6666666666, ans=0.125 2024-09-25 02:52:00,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=646711.3333333334, ans=0.0 2024-09-25 02:52:13,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=646711.3333333334, ans=0.125 2024-09-25 02:52:13,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=646711.3333333334, ans=0.125 2024-09-25 02:52:21,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=646758.0, ans=0.0 2024-09-25 02:52:22,867 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.282e+02 1.343e+02 1.455e+02 1.826e+02, threshold=2.686e+02, percent-clipped=0.0 2024-09-25 02:52:38,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=646804.6666666666, ans=0.04949747468305833 2024-09-25 02:52:51,064 INFO [train.py:1198] (3/4) Epoch 36, batch 2250, loss[loss=0.2154, ctc_loss=0.1399, cr_loss=0.3778, over 17354.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.126, cr_loss=0.3432, over 3355614.77 frames. ], batch size: 48, lr: 3.30e-03, grad_scale: 32.0 2024-09-25 02:53:05,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=646898.0, ans=0.125 2024-09-25 02:53:19,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=646898.0, ans=0.125 2024-09-25 02:53:29,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=646944.6666666666, ans=0.125 2024-09-25 02:53:32,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=646944.6666666666, ans=0.2 2024-09-25 02:53:45,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=646991.3333333334, ans=0.2 2024-09-25 02:53:51,540 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 02:54:16,048 INFO [train.py:1198] (3/4) Epoch 36, batch 2300, loss[loss=0.1888, ctc_loss=0.1198, cr_loss=0.3452, over 17178.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1259, cr_loss=0.3431, over 3348948.73 frames. ], batch size: 45, lr: 3.30e-03, grad_scale: 8.0 2024-09-25 02:54:39,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=647131.3333333334, ans=0.0 2024-09-25 02:54:57,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.05 vs. limit=15.0 2024-09-25 02:55:14,011 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.282e+02 1.356e+02 1.478e+02 2.427e+02, threshold=2.712e+02, percent-clipped=0.0 2024-09-25 02:55:20,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=647271.3333333334, ans=0.0 2024-09-25 02:55:36,334 INFO [train.py:1198] (3/4) Epoch 36, batch 2350, loss[loss=0.2337, ctc_loss=0.1573, cr_loss=0.3822, over 16608.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1262, cr_loss=0.3432, over 3346552.17 frames. ], batch size: 66, lr: 3.30e-03, grad_scale: 8.0 2024-09-25 02:56:02,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=647364.6666666666, ans=0.0 2024-09-25 02:56:02,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=647364.6666666666, ans=0.0 2024-09-25 02:56:04,600 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=22.5 2024-09-25 02:56:42,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.78 vs. limit=6.0 2024-09-25 02:56:45,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=647504.6666666666, ans=0.2 2024-09-25 02:56:58,846 INFO [train.py:1198] (3/4) Epoch 36, batch 2400, loss[loss=0.2007, ctc_loss=0.1317, cr_loss=0.345, over 17215.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1271, cr_loss=0.3444, over 3335498.73 frames. ], batch size: 50, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 02:57:12,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=647551.3333333334, ans=0.125 2024-09-25 02:57:15,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=647598.0, ans=0.0 2024-09-25 02:57:27,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=647598.0, ans=0.0 2024-09-25 02:57:28,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=647598.0, ans=0.125 2024-09-25 02:57:58,956 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.280e+02 1.384e+02 1.461e+02 3.295e+02, threshold=2.768e+02, percent-clipped=1.0 2024-09-25 02:58:14,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=647738.0, ans=0.0 2024-09-25 02:58:24,118 INFO [train.py:1198] (3/4) Epoch 36, batch 2450, loss[loss=0.1833, ctc_loss=0.1168, cr_loss=0.3325, over 17158.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1264, cr_loss=0.3438, over 3347141.42 frames. ], batch size: 48, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 02:58:45,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=647831.3333333334, ans=0.125 2024-09-25 02:59:01,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.20 vs. limit=15.0 2024-09-25 02:59:03,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=647878.0, ans=0.125 2024-09-25 02:59:45,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=648018.0, ans=0.125 2024-09-25 02:59:46,720 INFO [train.py:1198] (3/4) Epoch 36, batch 2500, loss[loss=0.2149, ctc_loss=0.1409, cr_loss=0.37, over 17007.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1262, cr_loss=0.344, over 3356275.22 frames. ], batch size: 53, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 02:59:50,805 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.46 vs. limit=15.0 2024-09-25 02:59:53,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=648018.0, ans=0.05 2024-09-25 03:00:22,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=648111.3333333334, ans=0.025 2024-09-25 03:00:44,519 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.286e+02 1.353e+02 1.494e+02 3.015e+02, threshold=2.705e+02, percent-clipped=1.0 2024-09-25 03:01:02,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=648204.6666666666, ans=0.125 2024-09-25 03:01:05,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=648251.3333333334, ans=0.2 2024-09-25 03:01:06,745 INFO [train.py:1198] (3/4) Epoch 36, batch 2550, loss[loss=0.2246, ctc_loss=0.1489, cr_loss=0.3783, over 15796.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1267, cr_loss=0.3456, over 3357454.53 frames. ], batch size: 74, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 03:01:15,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=648251.3333333334, ans=0.125 2024-09-25 03:01:27,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=648298.0, ans=0.1 2024-09-25 03:02:25,749 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:02:31,702 INFO [train.py:1198] (3/4) Epoch 36, batch 2600, loss[loss=0.1808, ctc_loss=0.1156, cr_loss=0.3264, over 17305.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.1268, cr_loss=0.346, over 3356790.52 frames. ], batch size: 49, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 03:02:34,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2024-09-25 03:02:39,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=648484.6666666666, ans=0.125 2024-09-25 03:03:26,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=648624.6666666666, ans=0.1 2024-09-25 03:03:31,217 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.288e+02 1.366e+02 1.471e+02 2.094e+02, threshold=2.731e+02, percent-clipped=0.0 2024-09-25 03:03:56,676 INFO [train.py:1198] (3/4) Epoch 36, batch 2650, loss[loss=0.1529, ctc_loss=0.09694, cr_loss=0.2799, over 16232.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1272, cr_loss=0.3463, over 3363730.78 frames. ], batch size: 36, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:04:16,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=648764.6666666666, ans=0.2 2024-09-25 03:04:18,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.54 vs. limit=15.0 2024-09-25 03:04:40,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-09-25 03:04:48,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=648858.0, ans=0.0 2024-09-25 03:04:59,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=648904.6666666666, ans=0.125 2024-09-25 03:05:16,418 INFO [train.py:1198] (3/4) Epoch 36, batch 2700, loss[loss=0.199, ctc_loss=0.1288, cr_loss=0.3512, over 17015.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1265, cr_loss=0.3452, over 3368728.65 frames. ], batch size: 44, lr: 3.29e-03, grad_scale: 8.0 2024-09-25 03:05:40,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=648998.0, ans=0.035 2024-09-25 03:05:41,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=648998.0, ans=0.125 2024-09-25 03:06:10,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649091.3333333334, ans=0.1 2024-09-25 03:06:13,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=649091.3333333334, ans=0.1 2024-09-25 03:06:15,209 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.266e+02 1.332e+02 1.436e+02 2.982e+02, threshold=2.664e+02, percent-clipped=1.0 2024-09-25 03:06:35,963 INFO [train.py:1198] (3/4) Epoch 36, batch 2750, loss[loss=0.1803, ctc_loss=0.1143, cr_loss=0.3302, over 17203.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1265, cr_loss=0.3449, over 3357880.90 frames. ], batch size: 41, lr: 3.29e-03, grad_scale: 8.0 2024-09-25 03:06:42,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=649184.6666666666, ans=0.04949747468305833 2024-09-25 03:06:54,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=649231.3333333334, ans=0.0 2024-09-25 03:08:01,229 INFO [train.py:1198] (3/4) Epoch 36, batch 2800, loss[loss=0.2089, ctc_loss=0.134, cr_loss=0.3745, over 17055.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1275, cr_loss=0.3466, over 3362436.81 frames. ], batch size: 46, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:08:13,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=649418.0, ans=0.2 2024-09-25 03:08:33,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=649464.6666666666, ans=0.125 2024-09-25 03:08:39,596 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:08:42,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=649511.3333333334, ans=0.125 2024-09-25 03:09:05,754 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.257e+02 1.332e+02 1.427e+02 2.151e+02, threshold=2.663e+02, percent-clipped=0.0 2024-09-25 03:09:14,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=649604.6666666666, ans=0.1 2024-09-25 03:09:21,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=649604.6666666666, ans=0.125 2024-09-25 03:09:26,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2024-09-25 03:09:27,095 INFO [train.py:1198] (3/4) Epoch 36, batch 2850, loss[loss=0.1962, ctc_loss=0.1301, cr_loss=0.3307, over 17310.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1272, cr_loss=0.3461, over 3359692.46 frames. ], batch size: 49, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:09:32,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=649651.3333333334, ans=0.125 2024-09-25 03:10:10,999 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.50 vs. limit=10.0 2024-09-25 03:10:46,602 INFO [train.py:1198] (3/4) Epoch 36, batch 2900, loss[loss=0.2356, ctc_loss=0.1566, cr_loss=0.3951, over 14951.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1268, cr_loss=0.3451, over 3361153.45 frames. ], batch size: 89, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:10:51,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=649884.6666666666, ans=0.0 2024-09-25 03:11:17,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.21 vs. limit=22.5 2024-09-25 03:11:28,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=649978.0, ans=0.125 2024-09-25 03:11:33,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.83 vs. limit=10.0 2024-09-25 03:11:45,704 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.278e+02 1.363e+02 1.442e+02 1.816e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-25 03:12:08,900 INFO [train.py:1198] (3/4) Epoch 36, batch 2950, loss[loss=0.2334, ctc_loss=0.1514, cr_loss=0.4097, over 16505.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1259, cr_loss=0.3432, over 3367532.57 frames. ], batch size: 66, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:12:20,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2024-09-25 03:13:05,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=650258.0, ans=0.125 2024-09-25 03:13:15,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=650304.6666666666, ans=0.125 2024-09-25 03:13:28,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=650304.6666666666, ans=0.2 2024-09-25 03:13:32,871 INFO [train.py:1198] (3/4) Epoch 36, batch 3000, loss[loss=0.1802, ctc_loss=0.1164, cr_loss=0.3191, over 17005.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1264, cr_loss=0.3442, over 3364535.88 frames. ], batch size: 51, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:13:32,872 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 03:13:48,649 INFO [train.py:1230] (3/4) Epoch 36, validation: loss=0.03616, ctc_loss=0.03616, cr_loss=9.264e-15, over 944034.00 frames. 2024-09-25 03:13:48,650 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 03:13:53,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=650351.3333333334, ans=0.125 2024-09-25 03:14:08,267 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.16 vs. limit=12.0 2024-09-25 03:14:21,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=650444.6666666666, ans=0.2 2024-09-25 03:14:23,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=650444.6666666666, ans=0.0 2024-09-25 03:14:47,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=650491.3333333334, ans=0.125 2024-09-25 03:14:48,919 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.273e+02 1.377e+02 1.498e+02 1.977e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-25 03:15:09,706 INFO [train.py:1198] (3/4) Epoch 36, batch 3050, loss[loss=0.1679, ctc_loss=0.1082, cr_loss=0.299, over 17176.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.1266, cr_loss=0.3451, over 3352045.65 frames. ], batch size: 41, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:15:46,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=650678.0, ans=0.0 2024-09-25 03:15:52,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=650678.0, ans=0.0 2024-09-25 03:16:00,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=650724.6666666666, ans=0.125 2024-09-25 03:16:11,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=650771.3333333334, ans=0.125 2024-09-25 03:16:14,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650771.3333333334, ans=0.1 2024-09-25 03:16:20,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=650771.3333333334, ans=0.05 2024-09-25 03:16:22,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=650771.3333333334, ans=0.2 2024-09-25 03:16:28,399 INFO [train.py:1198] (3/4) Epoch 36, batch 3100, loss[loss=0.147, ctc_loss=0.09315, cr_loss=0.2691, over 16924.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1255, cr_loss=0.3428, over 3358654.95 frames. ], batch size: 42, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:16:31,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=650818.0, ans=0.0 2024-09-25 03:16:36,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=650818.0, ans=0.125 2024-09-25 03:16:41,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.81 vs. limit=10.0 2024-09-25 03:16:49,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=650864.6666666666, ans=0.2 2024-09-25 03:16:55,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=650864.6666666666, ans=0.0 2024-09-25 03:17:06,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=650911.3333333334, ans=0.125 2024-09-25 03:17:26,172 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.257e+02 1.357e+02 1.451e+02 2.020e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-25 03:17:39,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=651004.6666666666, ans=0.2 2024-09-25 03:17:39,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=651004.6666666666, ans=0.125 2024-09-25 03:17:46,714 INFO [train.py:1198] (3/4) Epoch 36, batch 3150, loss[loss=0.1936, ctc_loss=0.1238, cr_loss=0.3492, over 17355.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.125, cr_loss=0.3422, over 3364468.88 frames. ], batch size: 48, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:17:59,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=651051.3333333334, ans=0.125 2024-09-25 03:18:56,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.08 vs. limit=15.0 2024-09-25 03:19:05,099 INFO [train.py:1198] (3/4) Epoch 36, batch 3200, loss[loss=0.1846, ctc_loss=0.1176, cr_loss=0.3346, over 17329.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1256, cr_loss=0.3424, over 3363644.81 frames. ], batch size: 52, lr: 3.29e-03, grad_scale: 32.0 2024-09-25 03:19:35,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=651378.0, ans=0.025 2024-09-25 03:19:35,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.07 vs. limit=12.0 2024-09-25 03:20:00,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2024-09-25 03:20:02,961 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.272e+02 1.367e+02 1.451e+02 1.708e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-25 03:20:12,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=651471.3333333334, ans=0.125 2024-09-25 03:20:21,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-09-25 03:20:23,376 INFO [train.py:1198] (3/4) Epoch 36, batch 3250, loss[loss=0.1879, ctc_loss=0.1216, cr_loss=0.3314, over 17230.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.126, cr_loss=0.3437, over 3368813.67 frames. ], batch size: 47, lr: 3.29e-03, grad_scale: 32.0 2024-09-25 03:20:28,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=651518.0, ans=0.125 2024-09-25 03:20:28,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=651518.0, ans=0.1 2024-09-25 03:20:29,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=651518.0, ans=0.125 2024-09-25 03:20:47,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-09-25 03:21:09,770 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:21:17,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=651658.0, ans=0.1 2024-09-25 03:21:17,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=651658.0, ans=0.0 2024-09-25 03:21:40,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=651704.6666666666, ans=0.0 2024-09-25 03:21:43,852 INFO [train.py:1198] (3/4) Epoch 36, batch 3300, loss[loss=0.2308, ctc_loss=0.1572, cr_loss=0.3676, over 11856.00 frames. ], tot_loss[loss=0.1942, ctc_loss=0.1256, cr_loss=0.3427, over 3358905.98 frames. ], batch size: 123, lr: 3.29e-03, grad_scale: 32.0 2024-09-25 03:21:49,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=651751.3333333334, ans=0.125 2024-09-25 03:22:09,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=651798.0, ans=0.125 2024-09-25 03:22:11,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-09-25 03:22:12,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=651798.0, ans=0.125 2024-09-25 03:22:17,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.55 vs. limit=15.0 2024-09-25 03:22:26,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=651844.6666666666, ans=0.125 2024-09-25 03:22:44,635 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.278e+02 1.344e+02 1.481e+02 2.113e+02, threshold=2.688e+02, percent-clipped=0.0 2024-09-25 03:22:48,506 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=15.0 2024-09-25 03:22:49,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2024-09-25 03:23:03,252 INFO [train.py:1198] (3/4) Epoch 36, batch 3350, loss[loss=0.2158, ctc_loss=0.1406, cr_loss=0.3762, over 16901.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1255, cr_loss=0.3419, over 3341515.65 frames. ], batch size: 58, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:23:17,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=652031.3333333334, ans=0.1 2024-09-25 03:23:27,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=652031.3333333334, ans=0.2 2024-09-25 03:23:35,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2024-09-25 03:23:57,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.10 vs. limit=10.0 2024-09-25 03:24:01,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=652124.6666666666, ans=0.1 2024-09-25 03:24:19,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=652171.3333333334, ans=0.125 2024-09-25 03:24:23,600 INFO [train.py:1198] (3/4) Epoch 36, batch 3400, loss[loss=0.1842, ctc_loss=0.1164, cr_loss=0.3391, over 17294.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.126, cr_loss=0.3432, over 3350109.73 frames. ], batch size: 46, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:24:24,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2024-09-25 03:24:49,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-09-25 03:24:50,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=652264.6666666666, ans=0.0 2024-09-25 03:24:50,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=652264.6666666666, ans=0.0 2024-09-25 03:24:59,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=652311.3333333334, ans=0.1 2024-09-25 03:25:03,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-09-25 03:25:22,901 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.287e+02 1.358e+02 1.469e+02 2.050e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-25 03:25:44,231 INFO [train.py:1198] (3/4) Epoch 36, batch 3450, loss[loss=0.1927, ctc_loss=0.1224, cr_loss=0.3511, over 17260.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.125, cr_loss=0.3414, over 3359206.71 frames. ], batch size: 44, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:25:46,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=652451.3333333334, ans=0.1 2024-09-25 03:25:52,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=652451.3333333334, ans=0.0 2024-09-25 03:25:55,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=652451.3333333334, ans=0.035 2024-09-25 03:26:11,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2024-09-25 03:26:41,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=652591.3333333334, ans=0.125 2024-09-25 03:26:45,815 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.31 vs. limit=22.5 2024-09-25 03:26:51,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=652638.0, ans=0.0 2024-09-25 03:27:02,028 INFO [train.py:1198] (3/4) Epoch 36, batch 3500, loss[loss=0.2205, ctc_loss=0.1435, cr_loss=0.3849, over 17016.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.125, cr_loss=0.3405, over 3363632.54 frames. ], batch size: 56, lr: 3.28e-03, grad_scale: 16.0 2024-09-25 03:27:33,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=652778.0, ans=0.125 2024-09-25 03:27:33,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=22.5 2024-09-25 03:27:38,387 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2024-09-25 03:27:49,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=652824.6666666666, ans=0.2 2024-09-25 03:28:02,939 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.297e+02 1.416e+02 1.575e+02 3.387e+02, threshold=2.833e+02, percent-clipped=1.0 2024-09-25 03:28:20,556 INFO [train.py:1198] (3/4) Epoch 36, batch 3550, loss[loss=0.197, ctc_loss=0.1289, cr_loss=0.3406, over 15935.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1241, cr_loss=0.3393, over 3364071.22 frames. ], batch size: 74, lr: 3.28e-03, grad_scale: 8.0 2024-09-25 03:28:24,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.40 vs. limit=10.0 2024-09-25 03:28:40,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=22.5 2024-09-25 03:28:42,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=652964.6666666666, ans=0.125 2024-09-25 03:28:44,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=652964.6666666666, ans=0.2 2024-09-25 03:29:09,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=653058.0, ans=0.125 2024-09-25 03:29:11,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.37 vs. limit=10.0 2024-09-25 03:29:17,533 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=22.5 2024-09-25 03:29:19,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.80 vs. limit=6.0 2024-09-25 03:29:35,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=653104.6666666666, ans=0.0 2024-09-25 03:29:37,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=653151.3333333334, ans=0.125 2024-09-25 03:29:38,745 INFO [train.py:1198] (3/4) Epoch 36, batch 3600, loss[loss=0.1958, ctc_loss=0.1264, cr_loss=0.3469, over 16812.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1246, cr_loss=0.3404, over 3353470.64 frames. ], batch size: 61, lr: 3.28e-03, grad_scale: 16.0 2024-09-25 03:29:45,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.28 vs. limit=15.0 2024-09-25 03:30:21,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2024-09-25 03:30:43,994 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.277e+02 1.347e+02 1.445e+02 1.925e+02, threshold=2.694e+02, percent-clipped=0.0 2024-09-25 03:30:54,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=653338.0, ans=0.2 2024-09-25 03:31:00,964 INFO [train.py:1198] (3/4) Epoch 36, batch 3650, loss[loss=0.1826, ctc_loss=0.1201, cr_loss=0.3126, over 17028.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.124, cr_loss=0.3398, over 3362763.38 frames. ], batch size: 44, lr: 3.28e-03, grad_scale: 16.0 2024-09-25 03:32:11,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=653571.3333333334, ans=0.07 2024-09-25 03:32:22,171 INFO [train.py:1198] (3/4) Epoch 36, batch 3700, loss[loss=0.1615, ctc_loss=0.1023, cr_loss=0.2961, over 15782.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1247, cr_loss=0.341, over 3351656.78 frames. ], batch size: 35, lr: 3.28e-03, grad_scale: 16.0 2024-09-25 03:32:47,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=653664.6666666666, ans=0.125 2024-09-25 03:33:20,313 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=12.0 2024-09-25 03:33:24,004 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.267e+02 1.341e+02 1.465e+02 1.805e+02, threshold=2.682e+02, percent-clipped=0.0 2024-09-25 03:33:32,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=653804.6666666666, ans=0.125 2024-09-25 03:33:38,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=653804.6666666666, ans=0.0 2024-09-25 03:33:41,506 INFO [train.py:1198] (3/4) Epoch 36, batch 3750, loss[loss=0.1771, ctc_loss=0.1137, cr_loss=0.3174, over 17050.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1251, cr_loss=0.3407, over 3339476.68 frames. ], batch size: 39, lr: 3.28e-03, grad_scale: 16.0 2024-09-25 03:33:49,604 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:35:01,335 INFO [train.py:1198] (3/4) Epoch 36, batch 3800, loss[loss=0.154, ctc_loss=0.09781, cr_loss=0.2809, over 16341.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.1252, cr_loss=0.341, over 3339518.81 frames. ], batch size: 36, lr: 3.28e-03, grad_scale: 16.0 2024-09-25 03:35:17,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=654131.3333333334, ans=0.125 2024-09-25 03:35:33,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.89 vs. limit=22.5 2024-09-25 03:35:54,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=654224.6666666666, ans=0.0 2024-09-25 03:35:56,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=654224.6666666666, ans=0.125 2024-09-25 03:36:02,510 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.295e+02 1.360e+02 1.464e+02 2.754e+02, threshold=2.720e+02, percent-clipped=1.0 2024-09-25 03:36:19,642 INFO [train.py:1198] (3/4) Epoch 36, batch 3850, loss[loss=0.1385, ctc_loss=0.08649, cr_loss=0.2602, over 17052.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.125, cr_loss=0.3404, over 3316362.05 frames. ], batch size: 39, lr: 3.28e-03, grad_scale: 16.0 2024-09-25 03:36:23,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.02 vs. limit=10.0 2024-09-25 03:36:47,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=654364.6666666666, ans=0.125 2024-09-25 03:36:52,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.36 vs. limit=15.0 2024-09-25 03:37:16,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.71 vs. limit=15.0 2024-09-25 03:38:18,970 INFO [train.py:1198] (3/4) Epoch 37, batch 0, loss[loss=0.1835, ctc_loss=0.1173, cr_loss=0.3311, over 17262.00 frames. ], tot_loss[loss=0.1835, ctc_loss=0.1173, cr_loss=0.3311, over 17262.00 frames. ], batch size: 42, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:38:18,971 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 03:38:34,302 INFO [train.py:1230] (3/4) Epoch 37, validation: loss=0.03489, ctc_loss=0.03489, cr_loss=9.463e-15, over 944034.00 frames. 2024-09-25 03:38:34,303 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 03:38:37,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=654532.6666666666, ans=0.125 2024-09-25 03:38:37,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=654532.6666666666, ans=0.0 2024-09-25 03:38:39,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654532.6666666666, ans=0.1 2024-09-25 03:38:41,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654532.6666666666, ans=0.1 2024-09-25 03:38:43,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=22.5 2024-09-25 03:38:50,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=654579.3333333334, ans=0.0 2024-09-25 03:38:53,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=654579.3333333334, ans=0.125 2024-09-25 03:39:00,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=654579.3333333334, ans=0.025 2024-09-25 03:39:30,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=654672.6666666666, ans=0.125 2024-09-25 03:39:39,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=654719.3333333334, ans=10.0 2024-09-25 03:39:41,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=654719.3333333334, ans=0.2 2024-09-25 03:39:47,467 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.358e+02 1.528e+02 1.726e+02 8.114e+02, threshold=3.057e+02, percent-clipped=1.0 2024-09-25 03:39:57,024 INFO [train.py:1198] (3/4) Epoch 37, batch 50, loss[loss=0.2146, ctc_loss=0.1391, cr_loss=0.3776, over 17203.00 frames. ], tot_loss[loss=0.1974, ctc_loss=0.1275, cr_loss=0.3491, over 766744.85 frames. ], batch size: 50, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:40:00,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=654766.0, ans=0.0 2024-09-25 03:40:09,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2024-09-25 03:40:17,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.48 vs. limit=22.5 2024-09-25 03:40:38,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=654859.3333333334, ans=0.125 2024-09-25 03:41:08,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=654952.6666666666, ans=0.0 2024-09-25 03:41:17,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=654999.3333333334, ans=0.0 2024-09-25 03:41:19,003 INFO [train.py:1198] (3/4) Epoch 37, batch 100, loss[loss=0.2114, ctc_loss=0.1373, cr_loss=0.3708, over 17315.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1256, cr_loss=0.3445, over 1343427.98 frames. ], batch size: 51, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:41:24,444 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-09-25 03:41:29,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=15.0 2024-09-25 03:42:29,429 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.259e+02 1.323e+02 1.391e+02 2.896e+02, threshold=2.646e+02, percent-clipped=0.0 2024-09-25 03:42:37,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=655232.6666666666, ans=0.125 2024-09-25 03:42:39,025 INFO [train.py:1198] (3/4) Epoch 37, batch 150, loss[loss=0.2397, ctc_loss=0.1648, cr_loss=0.3743, over 11522.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1248, cr_loss=0.3424, over 1788996.27 frames. ], batch size: 123, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:42:55,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=655279.3333333334, ans=0.2 2024-09-25 03:43:58,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=655419.3333333334, ans=0.2 2024-09-25 03:44:07,484 INFO [train.py:1198] (3/4) Epoch 37, batch 200, loss[loss=0.1701, ctc_loss=0.1083, cr_loss=0.3092, over 17259.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1245, cr_loss=0.342, over 2143711.87 frames. ], batch size: 44, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:44:29,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=655512.6666666666, ans=0.0 2024-09-25 03:44:37,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=655559.3333333334, ans=0.025 2024-09-25 03:44:37,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=655559.3333333334, ans=0.05 2024-09-25 03:45:09,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=655652.6666666666, ans=0.0 2024-09-25 03:45:19,826 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.270e+02 1.344e+02 1.450e+02 1.913e+02, threshold=2.687e+02, percent-clipped=0.0 2024-09-25 03:45:20,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=655652.6666666666, ans=0.125 2024-09-25 03:45:29,188 INFO [train.py:1198] (3/4) Epoch 37, batch 250, loss[loss=0.1971, ctc_loss=0.1278, cr_loss=0.3469, over 17041.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.124, cr_loss=0.3407, over 2415704.56 frames. ], batch size: 52, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:45:58,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=655746.0, ans=0.05 2024-09-25 03:45:59,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=655746.0, ans=0.125 2024-09-25 03:46:07,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=655792.6666666666, ans=0.125 2024-09-25 03:46:24,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=655839.3333333334, ans=0.0 2024-09-25 03:46:34,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=655886.0, ans=0.05 2024-09-25 03:46:50,163 INFO [train.py:1198] (3/4) Epoch 37, batch 300, loss[loss=0.2034, ctc_loss=0.1349, cr_loss=0.3421, over 17137.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.123, cr_loss=0.3379, over 2632849.15 frames. ], batch size: 48, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:46:53,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=655932.6666666666, ans=0.1 2024-09-25 03:47:05,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-09-25 03:47:21,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.13 vs. limit=15.0 2024-09-25 03:47:33,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=656026.0, ans=0.0 2024-09-25 03:47:48,190 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:47:59,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=656119.3333333334, ans=0.125 2024-09-25 03:48:00,707 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.285e+02 1.370e+02 1.458e+02 2.873e+02, threshold=2.740e+02, percent-clipped=1.0 2024-09-25 03:48:10,580 INFO [train.py:1198] (3/4) Epoch 37, batch 350, loss[loss=0.1677, ctc_loss=0.1067, cr_loss=0.3054, over 17315.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.124, cr_loss=0.3404, over 2784451.65 frames. ], batch size: 46, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:48:29,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=656212.6666666666, ans=0.025 2024-09-25 03:49:05,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=656306.0, ans=0.025 2024-09-25 03:49:07,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2024-09-25 03:49:20,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=656306.0, ans=12.0 2024-09-25 03:49:38,695 INFO [train.py:1198] (3/4) Epoch 37, batch 400, loss[loss=0.1658, ctc_loss=0.105, cr_loss=0.304, over 17221.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.1241, cr_loss=0.3412, over 2915473.90 frames. ], batch size: 47, lr: 3.23e-03, grad_scale: 32.0 2024-09-25 03:49:48,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=656399.3333333334, ans=0.125 2024-09-25 03:50:09,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=656492.6666666666, ans=0.05 2024-09-25 03:50:27,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=656539.3333333334, ans=0.125 2024-09-25 03:50:45,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=656586.0, ans=0.025 2024-09-25 03:50:51,380 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.258e+02 1.358e+02 1.447e+02 2.750e+02, threshold=2.715e+02, percent-clipped=1.0 2024-09-25 03:50:51,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=656586.0, ans=0.04949747468305833 2024-09-25 03:51:01,057 INFO [train.py:1198] (3/4) Epoch 37, batch 450, loss[loss=0.1626, ctc_loss=0.1031, cr_loss=0.2974, over 17355.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1244, cr_loss=0.3414, over 2998520.45 frames. ], batch size: 48, lr: 3.23e-03, grad_scale: 32.0 2024-09-25 03:51:28,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=656679.3333333334, ans=0.1 2024-09-25 03:52:21,377 INFO [train.py:1198] (3/4) Epoch 37, batch 500, loss[loss=0.1694, ctc_loss=0.1101, cr_loss=0.2966, over 17085.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1256, cr_loss=0.3426, over 3068009.71 frames. ], batch size: 43, lr: 3.23e-03, grad_scale: 32.0 2024-09-25 03:52:29,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=656866.0, ans=10.0 2024-09-25 03:52:47,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=656912.6666666666, ans=0.125 2024-09-25 03:53:14,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-09-25 03:53:18,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=657006.0, ans=0.125 2024-09-25 03:53:29,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=657052.6666666666, ans=0.5 2024-09-25 03:53:37,042 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.284e+02 1.360e+02 1.514e+02 2.432e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-25 03:53:46,709 INFO [train.py:1198] (3/4) Epoch 37, batch 550, loss[loss=0.1686, ctc_loss=0.105, cr_loss=0.318, over 17161.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1256, cr_loss=0.3425, over 3128495.21 frames. ], batch size: 41, lr: 3.23e-03, grad_scale: 32.0 2024-09-25 03:54:28,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=657192.6666666666, ans=0.07 2024-09-25 03:55:12,195 INFO [train.py:1198] (3/4) Epoch 37, batch 600, loss[loss=0.2087, ctc_loss=0.1379, cr_loss=0.3542, over 16405.00 frames. ], tot_loss[loss=0.1943, ctc_loss=0.1257, cr_loss=0.3429, over 3182408.61 frames. ], batch size: 66, lr: 3.23e-03, grad_scale: 32.0 2024-09-25 03:55:20,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=657332.6666666666, ans=0.04949747468305833 2024-09-25 03:55:31,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=657379.3333333334, ans=0.0 2024-09-25 03:56:03,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=657472.6666666666, ans=0.0 2024-09-25 03:56:16,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=657519.3333333334, ans=0.125 2024-09-25 03:56:22,510 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.287e+02 1.356e+02 1.479e+02 3.442e+02, threshold=2.712e+02, percent-clipped=1.0 2024-09-25 03:56:32,242 INFO [train.py:1198] (3/4) Epoch 37, batch 650, loss[loss=0.1729, ctc_loss=0.1101, cr_loss=0.314, over 17027.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1261, cr_loss=0.3432, over 3230761.25 frames. ], batch size: 44, lr: 3.23e-03, grad_scale: 32.0 2024-09-25 03:56:38,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=657566.0, ans=0.0 2024-09-25 03:56:39,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-09-25 03:57:06,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=657659.3333333334, ans=0.1 2024-09-25 03:57:48,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=657752.6666666666, ans=0.125 2024-09-25 03:57:51,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=657799.3333333334, ans=0.125 2024-09-25 03:57:51,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=657799.3333333334, ans=0.1 2024-09-25 03:57:52,591 INFO [train.py:1198] (3/4) Epoch 37, batch 700, loss[loss=0.17, ctc_loss=0.1077, cr_loss=0.3117, over 16962.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1256, cr_loss=0.3429, over 3265260.08 frames. ], batch size: 42, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:58:08,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=657799.3333333334, ans=0.0 2024-09-25 03:59:02,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.37 vs. limit=15.0 2024-09-25 03:59:13,008 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.266e+02 1.364e+02 1.473e+02 1.883e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-25 03:59:13,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=657986.0, ans=0.1 2024-09-25 03:59:16,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=657986.0, ans=0.2 2024-09-25 03:59:20,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-09-25 03:59:21,271 INFO [train.py:1198] (3/4) Epoch 37, batch 750, loss[loss=0.2482, ctc_loss=0.166, cr_loss=0.4106, over 12223.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1249, cr_loss=0.3422, over 3285017.10 frames. ], batch size: 123, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:59:31,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=658032.6666666666, ans=0.0 2024-09-25 04:00:15,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=658172.6666666666, ans=0.2 2024-09-25 04:00:43,399 INFO [train.py:1198] (3/4) Epoch 37, batch 800, loss[loss=0.2163, ctc_loss=0.1366, cr_loss=0.3984, over 17101.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1247, cr_loss=0.3423, over 3301823.98 frames. ], batch size: 49, lr: 3.23e-03, grad_scale: 32.0 2024-09-25 04:01:15,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=658359.3333333334, ans=0.1 2024-09-25 04:01:31,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2024-09-25 04:01:53,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=658452.6666666666, ans=0.025 2024-09-25 04:01:54,704 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.298e+02 1.352e+02 1.475e+02 2.154e+02, threshold=2.705e+02, percent-clipped=0.0 2024-09-25 04:02:02,797 INFO [train.py:1198] (3/4) Epoch 37, batch 850, loss[loss=0.1478, ctc_loss=0.09167, cr_loss=0.2805, over 17061.00 frames. ], tot_loss[loss=0.1943, ctc_loss=0.1257, cr_loss=0.3432, over 3303059.52 frames. ], batch size: 39, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:02:57,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=658639.3333333334, ans=0.0 2024-09-25 04:03:00,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=658639.3333333334, ans=0.125 2024-09-25 04:03:12,131 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=12.0 2024-09-25 04:03:28,264 INFO [train.py:1198] (3/4) Epoch 37, batch 900, loss[loss=0.1979, ctc_loss=0.1277, cr_loss=0.3512, over 17306.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1261, cr_loss=0.3437, over 3320642.67 frames. ], batch size: 49, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:03:32,308 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.37 vs. limit=10.0 2024-09-25 04:03:49,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=658779.3333333334, ans=15.0 2024-09-25 04:04:08,459 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2024-09-25 04:04:30,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=658872.6666666666, ans=0.125 2024-09-25 04:04:43,135 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.276e+02 1.332e+02 1.387e+02 1.934e+02, threshold=2.664e+02, percent-clipped=0.0 2024-09-25 04:04:43,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2024-09-25 04:04:51,042 INFO [train.py:1198] (3/4) Epoch 37, batch 950, loss[loss=0.1861, ctc_loss=0.1194, cr_loss=0.3335, over 17094.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1259, cr_loss=0.3433, over 3327513.81 frames. ], batch size: 49, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:05:05,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=658966.0, ans=0.125 2024-09-25 04:05:09,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=659012.6666666666, ans=0.0 2024-09-25 04:06:13,087 INFO [train.py:1198] (3/4) Epoch 37, batch 1000, loss[loss=0.2173, ctc_loss=0.1417, cr_loss=0.3777, over 17227.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1258, cr_loss=0.3437, over 3335531.32 frames. ], batch size: 50, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:06:22,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659199.3333333334, ans=0.1 2024-09-25 04:06:38,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=659246.0, ans=0.0 2024-09-25 04:06:40,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=659246.0, ans=0.125 2024-09-25 04:07:10,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=22.5 2024-09-25 04:07:25,169 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.283e+02 1.388e+02 1.495e+02 1.963e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-25 04:07:30,315 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:07:33,141 INFO [train.py:1198] (3/4) Epoch 37, batch 1050, loss[loss=0.2217, ctc_loss=0.1414, cr_loss=0.4017, over 17143.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.126, cr_loss=0.3436, over 3339262.30 frames. ], batch size: 48, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:07:39,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=659432.6666666666, ans=0.125 2024-09-25 04:08:01,657 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:08:28,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.07 vs. limit=12.0 2024-09-25 04:08:49,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=659619.3333333334, ans=0.2 2024-09-25 04:09:00,286 INFO [train.py:1198] (3/4) Epoch 37, batch 1100, loss[loss=0.1803, ctc_loss=0.117, cr_loss=0.3169, over 17314.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1258, cr_loss=0.3435, over 3351009.03 frames. ], batch size: 49, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:09:38,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=659759.3333333334, ans=0.125 2024-09-25 04:09:41,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=659759.3333333334, ans=0.125 2024-09-25 04:10:14,846 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.287e+02 1.378e+02 1.545e+02 2.459e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 04:10:16,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=659852.6666666666, ans=0.125 2024-09-25 04:10:20,706 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2024-09-25 04:10:22,845 INFO [train.py:1198] (3/4) Epoch 37, batch 1150, loss[loss=0.1625, ctc_loss=0.104, cr_loss=0.2927, over 17036.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1254, cr_loss=0.3424, over 3358874.84 frames. ], batch size: 39, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:10:39,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=659946.0, ans=0.0 2024-09-25 04:10:40,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=659946.0, ans=0.125 2024-09-25 04:10:48,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=659946.0, ans=0.125 2024-09-25 04:10:53,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=659992.6666666666, ans=0.1 2024-09-25 04:11:00,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=659992.6666666666, ans=0.125 2024-09-25 04:11:29,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=660086.0, ans=0.0 2024-09-25 04:11:30,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=660086.0, ans=0.2 2024-09-25 04:11:37,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=660086.0, ans=0.125 2024-09-25 04:11:38,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=660086.0, ans=0.0 2024-09-25 04:11:43,096 INFO [train.py:1198] (3/4) Epoch 37, batch 1200, loss[loss=0.1841, ctc_loss=0.1176, cr_loss=0.332, over 16736.00 frames. ], tot_loss[loss=0.1943, ctc_loss=0.1256, cr_loss=0.3436, over 3357559.33 frames. ], batch size: 61, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:11:54,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=660132.6666666666, ans=0.125 2024-09-25 04:12:42,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=660272.6666666666, ans=0.2 2024-09-25 04:12:57,285 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.254e+02 1.340e+02 1.432e+02 2.120e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-25 04:13:05,189 INFO [train.py:1198] (3/4) Epoch 37, batch 1250, loss[loss=0.1878, ctc_loss=0.121, cr_loss=0.3341, over 17299.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1241, cr_loss=0.3407, over 3371943.11 frames. ], batch size: 51, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:13:14,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=660366.0, ans=0.025 2024-09-25 04:13:22,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=660412.6666666666, ans=0.125 2024-09-25 04:13:23,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=22.5 2024-09-25 04:13:48,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=660459.3333333334, ans=0.2 2024-09-25 04:13:55,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.07 vs. limit=15.0 2024-09-25 04:14:01,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=660506.0, ans=0.2 2024-09-25 04:14:14,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=660552.6666666666, ans=0.1 2024-09-25 04:14:14,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=660552.6666666666, ans=0.025 2024-09-25 04:14:30,161 INFO [train.py:1198] (3/4) Epoch 37, batch 1300, loss[loss=0.1892, ctc_loss=0.1222, cr_loss=0.3352, over 17008.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1248, cr_loss=0.3419, over 3367980.74 frames. ], batch size: 51, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:14:30,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=660599.3333333334, ans=0.125 2024-09-25 04:14:55,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=660646.0, ans=0.125 2024-09-25 04:15:38,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.24 vs. limit=12.0 2024-09-25 04:15:44,612 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 1.301e+02 1.386e+02 1.515e+02 1.831e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 04:15:46,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=660786.0, ans=0.125 2024-09-25 04:15:47,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=15.0 2024-09-25 04:15:52,778 INFO [train.py:1198] (3/4) Epoch 37, batch 1350, loss[loss=0.2013, ctc_loss=0.1239, cr_loss=0.3873, over 17294.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.125, cr_loss=0.3421, over 3368095.98 frames. ], batch size: 49, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:16:20,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=660879.3333333334, ans=0.1 2024-09-25 04:16:26,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=660926.0, ans=0.0 2024-09-25 04:16:29,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=22.5 2024-09-25 04:16:59,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=661019.3333333334, ans=0.125 2024-09-25 04:17:03,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=661019.3333333334, ans=0.125 2024-09-25 04:17:13,043 INFO [train.py:1198] (3/4) Epoch 37, batch 1400, loss[loss=0.1974, ctc_loss=0.1274, cr_loss=0.3499, over 17036.00 frames. ], tot_loss[loss=0.1936, ctc_loss=0.1253, cr_loss=0.3416, over 3356782.99 frames. ], batch size: 56, lr: 3.22e-03, grad_scale: 16.0 2024-09-25 04:17:16,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=661066.0, ans=0.0 2024-09-25 04:17:20,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=22.5 2024-09-25 04:17:34,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=661112.6666666666, ans=0.125 2024-09-25 04:17:39,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=12.0 2024-09-25 04:18:05,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=661206.0, ans=0.025 2024-09-25 04:18:31,386 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.311e+02 1.398e+02 1.519e+02 2.404e+02, threshold=2.797e+02, percent-clipped=0.0 2024-09-25 04:18:40,190 INFO [train.py:1198] (3/4) Epoch 37, batch 1450, loss[loss=0.1982, ctc_loss=0.1303, cr_loss=0.3394, over 16922.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.1252, cr_loss=0.3417, over 3357816.24 frames. ], batch size: 58, lr: 3.22e-03, grad_scale: 16.0 2024-09-25 04:18:43,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=661299.3333333334, ans=0.125 2024-09-25 04:18:43,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=661299.3333333334, ans=0.2 2024-09-25 04:18:48,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=661299.3333333334, ans=0.125 2024-09-25 04:19:08,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=22.5 2024-09-25 04:19:32,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=661439.3333333334, ans=10.0 2024-09-25 04:19:36,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=661439.3333333334, ans=0.0 2024-09-25 04:19:46,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=661486.0, ans=0.2 2024-09-25 04:19:46,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=661486.0, ans=0.125 2024-09-25 04:19:47,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=661486.0, ans=0.125 2024-09-25 04:20:02,951 INFO [train.py:1198] (3/4) Epoch 37, batch 1500, loss[loss=0.1661, ctc_loss=0.1043, cr_loss=0.3091, over 17069.00 frames. ], tot_loss[loss=0.1937, ctc_loss=0.1253, cr_loss=0.3423, over 3359216.85 frames. ], batch size: 43, lr: 3.22e-03, grad_scale: 16.0 2024-09-25 04:20:12,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=661532.6666666666, ans=0.0 2024-09-25 04:20:17,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=661579.3333333334, ans=0.125 2024-09-25 04:20:34,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=661626.0, ans=0.125 2024-09-25 04:21:07,374 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=15.0 2024-09-25 04:21:08,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=661719.3333333334, ans=0.125 2024-09-25 04:21:13,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=661719.3333333334, ans=0.2 2024-09-25 04:21:16,231 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.237e+02 1.312e+02 1.425e+02 1.916e+02, threshold=2.625e+02, percent-clipped=0.0 2024-09-25 04:21:22,596 INFO [train.py:1198] (3/4) Epoch 37, batch 1550, loss[loss=0.1987, ctc_loss=0.1271, cr_loss=0.3578, over 16953.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1259, cr_loss=0.3437, over 3355054.46 frames. ], batch size: 42, lr: 3.22e-03, grad_scale: 16.0 2024-09-25 04:21:24,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=661766.0, ans=0.125 2024-09-25 04:21:43,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=661812.6666666666, ans=0.125 2024-09-25 04:22:25,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=661952.6666666666, ans=0.125 2024-09-25 04:22:42,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=661952.6666666666, ans=0.125 2024-09-25 04:22:45,509 INFO [train.py:1198] (3/4) Epoch 37, batch 1600, loss[loss=0.2156, ctc_loss=0.1416, cr_loss=0.37, over 17232.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.125, cr_loss=0.3421, over 3360869.67 frames. ], batch size: 55, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:23:14,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=662046.0, ans=0.125 2024-09-25 04:23:50,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=662139.3333333334, ans=0.125 2024-09-25 04:24:01,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2024-09-25 04:24:03,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.76 vs. limit=15.0 2024-09-25 04:24:04,105 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.273e+02 1.358e+02 1.458e+02 2.530e+02, threshold=2.716e+02, percent-clipped=0.0 2024-09-25 04:24:10,513 INFO [train.py:1198] (3/4) Epoch 37, batch 1650, loss[loss=0.1991, ctc_loss=0.1287, cr_loss=0.352, over 17297.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1261, cr_loss=0.3447, over 3359005.29 frames. ], batch size: 46, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:24:31,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=662279.3333333334, ans=0.1 2024-09-25 04:24:35,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2024-09-25 04:24:37,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662279.3333333334, ans=0.1 2024-09-25 04:25:04,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=662372.6666666666, ans=0.025 2024-09-25 04:25:11,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.67 vs. limit=22.5 2024-09-25 04:25:20,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662419.3333333334, ans=0.1 2024-09-25 04:25:32,851 INFO [train.py:1198] (3/4) Epoch 37, batch 1700, loss[loss=0.1697, ctc_loss=0.1103, cr_loss=0.2971, over 16940.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1258, cr_loss=0.3439, over 3365735.63 frames. ], batch size: 42, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:25:53,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=662512.6666666666, ans=0.0 2024-09-25 04:26:07,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=662559.3333333334, ans=0.025 2024-09-25 04:26:10,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=662559.3333333334, ans=0.0 2024-09-25 04:26:12,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=662559.3333333334, ans=0.1 2024-09-25 04:26:18,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=662606.0, ans=0.125 2024-09-25 04:26:20,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=662606.0, ans=0.125 2024-09-25 04:26:31,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=662606.0, ans=0.125 2024-09-25 04:26:33,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2024-09-25 04:26:41,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=662652.6666666666, ans=0.2 2024-09-25 04:26:45,874 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.286e+02 1.377e+02 1.479e+02 2.270e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-25 04:26:52,405 INFO [train.py:1198] (3/4) Epoch 37, batch 1750, loss[loss=0.2032, ctc_loss=0.1339, cr_loss=0.3464, over 17096.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1255, cr_loss=0.3427, over 3354471.93 frames. ], batch size: 49, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:26:52,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=662699.3333333334, ans=0.1 2024-09-25 04:26:55,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=662699.3333333334, ans=0.025 2024-09-25 04:27:17,287 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=15.0 2024-09-25 04:27:23,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=662792.6666666666, ans=0.0 2024-09-25 04:27:24,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=662792.6666666666, ans=0.1 2024-09-25 04:27:27,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=662792.6666666666, ans=0.025 2024-09-25 04:27:58,925 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:28:14,221 INFO [train.py:1198] (3/4) Epoch 37, batch 1800, loss[loss=0.1629, ctc_loss=0.1045, cr_loss=0.2915, over 17298.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.1252, cr_loss=0.3419, over 3346602.11 frames. ], batch size: 46, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:28:34,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=662979.3333333334, ans=0.125 2024-09-25 04:28:40,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662979.3333333334, ans=0.1 2024-09-25 04:29:30,050 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.280e+02 1.352e+02 1.451e+02 1.795e+02, threshold=2.704e+02, percent-clipped=0.0 2024-09-25 04:29:39,048 INFO [train.py:1198] (3/4) Epoch 37, batch 1850, loss[loss=0.1777, ctc_loss=0.111, cr_loss=0.3337, over 17273.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1246, cr_loss=0.3412, over 3347657.09 frames. ], batch size: 44, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:30:02,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=22.5 2024-09-25 04:30:14,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=663259.3333333334, ans=0.125 2024-09-25 04:30:38,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=663306.0, ans=0.125 2024-09-25 04:30:40,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-09-25 04:30:49,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=663352.6666666666, ans=0.125 2024-09-25 04:30:55,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=663352.6666666666, ans=0.125 2024-09-25 04:30:58,768 INFO [train.py:1198] (3/4) Epoch 37, batch 1900, loss[loss=0.2063, ctc_loss=0.1352, cr_loss=0.3556, over 16865.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1249, cr_loss=0.3416, over 3354022.69 frames. ], batch size: 58, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:31:05,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=663399.3333333334, ans=0.0 2024-09-25 04:31:12,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=22.5 2024-09-25 04:31:16,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=663446.0, ans=0.0 2024-09-25 04:31:16,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=663446.0, ans=0.125 2024-09-25 04:31:26,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=663446.0, ans=0.05 2024-09-25 04:31:44,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.54 vs. limit=15.0 2024-09-25 04:32:12,967 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.273e+02 1.351e+02 1.492e+02 1.881e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-25 04:32:19,250 INFO [train.py:1198] (3/4) Epoch 37, batch 1950, loss[loss=0.1802, ctc_loss=0.1156, cr_loss=0.3228, over 17148.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.125, cr_loss=0.3416, over 3355004.87 frames. ], batch size: 45, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:32:26,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=663632.6666666666, ans=0.125 2024-09-25 04:33:24,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=663772.6666666666, ans=0.0 2024-09-25 04:33:24,564 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:33:29,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=663819.3333333334, ans=0.0 2024-09-25 04:33:32,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=663819.3333333334, ans=0.0 2024-09-25 04:33:46,765 INFO [train.py:1198] (3/4) Epoch 37, batch 2000, loss[loss=0.2104, ctc_loss=0.1363, cr_loss=0.3704, over 16195.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.1251, cr_loss=0.3418, over 3352793.17 frames. ], batch size: 74, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:33:55,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=663866.0, ans=0.125 2024-09-25 04:34:01,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=663912.6666666666, ans=0.04949747468305833 2024-09-25 04:34:24,185 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:34:33,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=664006.0, ans=0.5 2024-09-25 04:35:02,488 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.268e+02 1.366e+02 1.468e+02 1.745e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-25 04:35:08,887 INFO [train.py:1198] (3/4) Epoch 37, batch 2050, loss[loss=0.2257, ctc_loss=0.1497, cr_loss=0.3802, over 16871.00 frames. ], tot_loss[loss=0.1942, ctc_loss=0.1256, cr_loss=0.3431, over 3346173.06 frames. ], batch size: 58, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:35:12,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.51 vs. limit=6.0 2024-09-25 04:35:20,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=664099.3333333334, ans=0.1 2024-09-25 04:35:33,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.03 vs. limit=15.0 2024-09-25 04:35:36,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=664146.0, ans=0.125 2024-09-25 04:36:16,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=664286.0, ans=0.125 2024-09-25 04:36:28,589 INFO [train.py:1198] (3/4) Epoch 37, batch 2100, loss[loss=0.1552, ctc_loss=0.09926, cr_loss=0.2797, over 17182.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1248, cr_loss=0.3413, over 3354602.40 frames. ], batch size: 41, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:36:33,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=664332.6666666666, ans=0.1 2024-09-25 04:36:57,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=664379.3333333334, ans=0.2 2024-09-25 04:37:05,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=664426.0, ans=0.125 2024-09-25 04:37:16,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=664472.6666666666, ans=0.09899494936611666 2024-09-25 04:37:45,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=664519.3333333334, ans=0.025 2024-09-25 04:37:46,466 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.254e+02 1.343e+02 1.449e+02 1.904e+02, threshold=2.685e+02, percent-clipped=0.0 2024-09-25 04:37:51,357 INFO [train.py:1198] (3/4) Epoch 37, batch 2150, loss[loss=0.1832, ctc_loss=0.1182, cr_loss=0.325, over 16961.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1246, cr_loss=0.341, over 3362101.80 frames. ], batch size: 42, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:38:08,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=664612.6666666666, ans=0.125 2024-09-25 04:38:08,918 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2024-09-25 04:38:18,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=664612.6666666666, ans=0.09899494936611666 2024-09-25 04:38:31,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=664659.3333333334, ans=0.0 2024-09-25 04:38:46,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=664706.0, ans=0.0 2024-09-25 04:38:49,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=664706.0, ans=0.0 2024-09-25 04:39:11,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=664752.6666666666, ans=0.05 2024-09-25 04:39:15,890 INFO [train.py:1198] (3/4) Epoch 37, batch 2200, loss[loss=0.1822, ctc_loss=0.1157, cr_loss=0.3328, over 16994.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1254, cr_loss=0.3428, over 3360175.63 frames. ], batch size: 53, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:40:26,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=664986.0, ans=0.125 2024-09-25 04:40:34,122 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.296e+02 1.362e+02 1.438e+02 1.964e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-25 04:40:38,877 INFO [train.py:1198] (3/4) Epoch 37, batch 2250, loss[loss=0.2159, ctc_loss=0.1412, cr_loss=0.3733, over 16388.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1265, cr_loss=0.3444, over 3352085.00 frames. ], batch size: 66, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:40:40,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=665032.6666666666, ans=0.0 2024-09-25 04:40:56,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=665079.3333333334, ans=0.125 2024-09-25 04:41:05,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=15.0 2024-09-25 04:41:23,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=665126.0, ans=0.125 2024-09-25 04:41:55,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.44 vs. limit=15.0 2024-09-25 04:41:58,861 INFO [train.py:1198] (3/4) Epoch 37, batch 2300, loss[loss=0.2038, ctc_loss=0.1289, cr_loss=0.3748, over 17301.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1262, cr_loss=0.3441, over 3362488.26 frames. ], batch size: 46, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:42:02,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=665266.0, ans=15.0 2024-09-25 04:42:07,414 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2024-09-25 04:42:28,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=665312.6666666666, ans=0.2 2024-09-25 04:43:21,780 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.297e+02 1.364e+02 1.435e+02 2.216e+02, threshold=2.727e+02, percent-clipped=0.0 2024-09-25 04:43:25,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=665499.3333333334, ans=0.125 2024-09-25 04:43:26,540 INFO [train.py:1198] (3/4) Epoch 37, batch 2350, loss[loss=0.2028, ctc_loss=0.1296, cr_loss=0.3659, over 17158.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.126, cr_loss=0.3438, over 3356415.17 frames. ], batch size: 45, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:44:04,357 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.71 vs. limit=12.0 2024-09-25 04:44:11,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=665592.6666666666, ans=0.1 2024-09-25 04:44:19,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=665639.3333333334, ans=0.0 2024-09-25 04:44:20,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=12.0 2024-09-25 04:44:42,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=665686.0, ans=0.125 2024-09-25 04:44:46,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=665686.0, ans=0.0 2024-09-25 04:44:48,924 INFO [train.py:1198] (3/4) Epoch 37, batch 2400, loss[loss=0.2087, ctc_loss=0.1368, cr_loss=0.3595, over 17309.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1271, cr_loss=0.346, over 3345029.78 frames. ], batch size: 51, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:44:49,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=665732.6666666666, ans=0.0 2024-09-25 04:44:52,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=665732.6666666666, ans=0.1 2024-09-25 04:44:58,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=665732.6666666666, ans=0.1 2024-09-25 04:45:03,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=665779.3333333334, ans=0.2 2024-09-25 04:46:03,760 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.277e+02 1.353e+02 1.451e+02 1.749e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-25 04:46:08,612 INFO [train.py:1198] (3/4) Epoch 37, batch 2450, loss[loss=0.2264, ctc_loss=0.1489, cr_loss=0.3876, over 17094.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1271, cr_loss=0.3465, over 3345285.83 frames. ], batch size: 49, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:46:13,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=665966.0, ans=0.0 2024-09-25 04:46:28,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=666012.6666666666, ans=0.0 2024-09-25 04:46:45,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=666059.3333333334, ans=0.125 2024-09-25 04:46:53,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=666059.3333333334, ans=0.0 2024-09-25 04:46:53,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=666059.3333333334, ans=0.1 2024-09-25 04:46:56,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=666106.0, ans=0.125 2024-09-25 04:47:03,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=666106.0, ans=0.125 2024-09-25 04:47:13,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=666152.6666666666, ans=0.0 2024-09-25 04:47:18,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=12.0 2024-09-25 04:47:27,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=666152.6666666666, ans=0.0 2024-09-25 04:47:30,971 INFO [train.py:1198] (3/4) Epoch 37, batch 2500, loss[loss=0.1913, ctc_loss=0.1229, cr_loss=0.3417, over 17232.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1273, cr_loss=0.3466, over 3342054.13 frames. ], batch size: 50, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:47:38,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2024-09-25 04:47:38,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.79 vs. limit=10.0 2024-09-25 04:47:41,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=666199.3333333334, ans=0.0 2024-09-25 04:48:03,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=666246.0, ans=0.125 2024-09-25 04:48:04,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=666292.6666666666, ans=0.125 2024-09-25 04:48:15,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=666292.6666666666, ans=0.125 2024-09-25 04:48:26,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=666339.3333333334, ans=0.125 2024-09-25 04:48:26,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=666339.3333333334, ans=0.0 2024-09-25 04:48:26,688 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2024-09-25 04:48:45,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=666386.0, ans=0.125 2024-09-25 04:48:53,437 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.305e+02 1.391e+02 1.498e+02 2.183e+02, threshold=2.782e+02, percent-clipped=0.0 2024-09-25 04:48:56,605 INFO [train.py:1198] (3/4) Epoch 37, batch 2550, loss[loss=0.1889, ctc_loss=0.1224, cr_loss=0.3323, over 17310.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1268, cr_loss=0.345, over 3350095.70 frames. ], batch size: 51, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:49:16,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=666479.3333333334, ans=0.1 2024-09-25 04:49:28,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=666479.3333333334, ans=0.0 2024-09-25 04:49:49,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=666572.6666666666, ans=0.0 2024-09-25 04:50:07,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=666619.3333333334, ans=0.125 2024-09-25 04:50:08,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=666619.3333333334, ans=0.2 2024-09-25 04:50:14,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=666619.3333333334, ans=0.0 2024-09-25 04:50:19,601 INFO [train.py:1198] (3/4) Epoch 37, batch 2600, loss[loss=0.2541, ctc_loss=0.1707, cr_loss=0.4169, over 15306.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1269, cr_loss=0.3448, over 3334699.76 frames. ], batch size: 89, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:50:42,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=666712.6666666666, ans=0.125 2024-09-25 04:50:50,736 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-09-25 04:51:06,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=666806.0, ans=0.125 2024-09-25 04:51:10,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=666806.0, ans=0.1 2024-09-25 04:51:26,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=666852.6666666666, ans=0.2 2024-09-25 04:51:30,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=666852.6666666666, ans=0.125 2024-09-25 04:51:36,208 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.272e+02 1.334e+02 1.506e+02 4.167e+02, threshold=2.667e+02, percent-clipped=1.0 2024-09-25 04:51:38,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=22.5 2024-09-25 04:51:39,401 INFO [train.py:1198] (3/4) Epoch 37, batch 2650, loss[loss=0.2031, ctc_loss=0.1313, cr_loss=0.3589, over 17147.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1266, cr_loss=0.3445, over 3351967.69 frames. ], batch size: 48, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 04:51:41,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-09-25 04:52:09,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=666946.0, ans=0.125 2024-09-25 04:52:19,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=666992.6666666666, ans=0.0 2024-09-25 04:52:24,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=666992.6666666666, ans=0.125 2024-09-25 04:53:07,641 INFO [train.py:1198] (3/4) Epoch 37, batch 2700, loss[loss=0.1964, ctc_loss=0.1261, cr_loss=0.3518, over 17140.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1265, cr_loss=0.3443, over 3346525.81 frames. ], batch size: 48, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 04:53:15,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=667132.6666666666, ans=0.1 2024-09-25 04:53:23,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=667179.3333333334, ans=0.0 2024-09-25 04:53:41,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=667226.0, ans=0.0 2024-09-25 04:53:41,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=667226.0, ans=0.125 2024-09-25 04:53:54,264 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:53:59,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=15.0 2024-09-25 04:54:15,298 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=22.5 2024-09-25 04:54:27,063 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.290e+02 1.345e+02 1.477e+02 2.693e+02, threshold=2.691e+02, percent-clipped=1.0 2024-09-25 04:54:30,304 INFO [train.py:1198] (3/4) Epoch 37, batch 2750, loss[loss=0.2146, ctc_loss=0.1416, cr_loss=0.3648, over 15238.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1264, cr_loss=0.3447, over 3346384.10 frames. ], batch size: 89, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 04:54:33,751 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:54:36,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=667366.0, ans=0.125 2024-09-25 04:55:44,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=667552.6666666666, ans=0.0 2024-09-25 04:55:50,339 INFO [train.py:1198] (3/4) Epoch 37, batch 2800, loss[loss=0.1847, ctc_loss=0.1187, cr_loss=0.33, over 16963.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.1258, cr_loss=0.343, over 3340813.59 frames. ], batch size: 42, lr: 3.20e-03, grad_scale: 32.0 2024-09-25 04:55:50,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=667599.3333333334, ans=0.125 2024-09-25 04:56:13,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=667646.0, ans=0.0 2024-09-25 04:56:22,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=667692.6666666666, ans=0.2 2024-09-25 04:56:26,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=667692.6666666666, ans=0.125 2024-09-25 04:57:10,009 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.310e+02 1.386e+02 1.476e+02 1.796e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 04:57:13,306 INFO [train.py:1198] (3/4) Epoch 37, batch 2850, loss[loss=0.2004, ctc_loss=0.1309, cr_loss=0.3475, over 17291.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1262, cr_loss=0.3441, over 3340336.55 frames. ], batch size: 49, lr: 3.20e-03, grad_scale: 32.0 2024-09-25 04:57:29,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=667879.3333333334, ans=0.0 2024-09-25 04:57:35,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=667879.3333333334, ans=0.125 2024-09-25 04:58:19,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=667972.6666666666, ans=0.5 2024-09-25 04:58:38,163 INFO [train.py:1198] (3/4) Epoch 37, batch 2900, loss[loss=0.2174, ctc_loss=0.1429, cr_loss=0.3725, over 16795.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1269, cr_loss=0.3451, over 3340986.49 frames. ], batch size: 61, lr: 3.20e-03, grad_scale: 32.0 2024-09-25 04:59:07,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=668112.6666666666, ans=0.1 2024-09-25 04:59:57,885 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.277e+02 1.411e+02 1.557e+02 2.099e+02, threshold=2.822e+02, percent-clipped=0.0 2024-09-25 05:00:01,064 INFO [train.py:1198] (3/4) Epoch 37, batch 2950, loss[loss=0.1874, ctc_loss=0.1211, cr_loss=0.3315, over 16257.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1272, cr_loss=0.346, over 3349632.92 frames. ], batch size: 36, lr: 3.20e-03, grad_scale: 32.0 2024-09-25 05:00:04,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=668299.3333333334, ans=0.125 2024-09-25 05:00:09,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=668299.3333333334, ans=0.0 2024-09-25 05:00:15,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=668346.0, ans=0.125 2024-09-25 05:00:17,904 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=22.5 2024-09-25 05:00:47,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=668439.3333333334, ans=0.0 2024-09-25 05:01:12,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=668486.0, ans=0.125 2024-09-25 05:01:20,185 INFO [train.py:1198] (3/4) Epoch 37, batch 3000, loss[loss=0.2045, ctc_loss=0.1372, cr_loss=0.3365, over 11868.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1265, cr_loss=0.3446, over 3348241.22 frames. ], batch size: 123, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:01:20,186 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 05:01:33,933 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.2388, 4.0232, 3.4583, 4.3851], device='cuda:3') 2024-09-25 05:01:35,771 INFO [train.py:1230] (3/4) Epoch 37, validation: loss=0.03526, ctc_loss=0.03526, cr_loss=1.039e-14, over 944034.00 frames. 2024-09-25 05:01:35,771 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 05:01:47,670 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=22.5 2024-09-25 05:01:51,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=668579.3333333334, ans=0.1 2024-09-25 05:01:59,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=668579.3333333334, ans=0.125 2024-09-25 05:02:01,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=668579.3333333334, ans=0.035 2024-09-25 05:02:08,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=668626.0, ans=0.125 2024-09-25 05:02:27,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=668672.6666666666, ans=0.05 2024-09-25 05:02:29,466 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.47 vs. limit=10.0 2024-09-25 05:02:33,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=668672.6666666666, ans=0.125 2024-09-25 05:02:40,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=668719.3333333334, ans=0.2 2024-09-25 05:02:40,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.26 vs. limit=15.0 2024-09-25 05:02:54,649 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.290e+02 1.381e+02 1.487e+02 2.036e+02, threshold=2.762e+02, percent-clipped=0.0 2024-09-25 05:02:56,258 INFO [train.py:1198] (3/4) Epoch 37, batch 3050, loss[loss=0.1852, ctc_loss=0.1217, cr_loss=0.3177, over 17156.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1264, cr_loss=0.3441, over 3337969.64 frames. ], batch size: 45, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:02:59,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=668766.0, ans=0.0 2024-09-25 05:03:13,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=668812.6666666666, ans=0.2 2024-09-25 05:04:14,261 INFO [train.py:1198] (3/4) Epoch 37, batch 3100, loss[loss=0.2357, ctc_loss=0.1541, cr_loss=0.408, over 16930.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1263, cr_loss=0.3437, over 3345123.49 frames. ], batch size: 58, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:04:27,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=668999.3333333334, ans=0.0 2024-09-25 05:04:30,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=669046.0, ans=0.125 2024-09-25 05:04:49,525 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2024-09-25 05:04:59,620 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2024-09-25 05:05:02,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=669139.3333333334, ans=0.2 2024-09-25 05:05:16,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=669139.3333333334, ans=0.04949747468305833 2024-09-25 05:05:19,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=669186.0, ans=0.125 2024-09-25 05:05:29,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=669186.0, ans=0.0 2024-09-25 05:05:35,824 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.257e+02 1.346e+02 1.447e+02 1.997e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-25 05:05:37,419 INFO [train.py:1198] (3/4) Epoch 37, batch 3150, loss[loss=0.1682, ctc_loss=0.1053, cr_loss=0.3147, over 17301.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1259, cr_loss=0.3431, over 3351405.56 frames. ], batch size: 42, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:05:42,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=669232.6666666666, ans=0.0 2024-09-25 05:05:43,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=669232.6666666666, ans=0.125 2024-09-25 05:05:46,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=669232.6666666666, ans=0.0 2024-09-25 05:05:49,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=669232.6666666666, ans=0.125 2024-09-25 05:05:54,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=669279.3333333334, ans=0.125 2024-09-25 05:06:08,361 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.90 vs. limit=15.0 2024-09-25 05:06:12,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=669326.0, ans=0.125 2024-09-25 05:06:18,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=669326.0, ans=0.2 2024-09-25 05:06:26,583 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2024-09-25 05:06:28,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=22.5 2024-09-25 05:06:29,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=12.0 2024-09-25 05:06:30,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=669372.6666666666, ans=0.125 2024-09-25 05:06:47,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=669419.3333333334, ans=0.0 2024-09-25 05:06:55,588 INFO [train.py:1198] (3/4) Epoch 37, batch 3200, loss[loss=0.2049, ctc_loss=0.133, cr_loss=0.3598, over 17002.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1255, cr_loss=0.342, over 3356632.17 frames. ], batch size: 53, lr: 3.20e-03, grad_scale: 32.0 2024-09-25 05:07:08,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=669466.0, ans=0.025 2024-09-25 05:07:15,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=669512.6666666666, ans=0.125 2024-09-25 05:07:16,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=669512.6666666666, ans=0.125 2024-09-25 05:07:23,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=669512.6666666666, ans=0.125 2024-09-25 05:07:41,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=669606.0, ans=0.125 2024-09-25 05:08:15,376 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.285e+02 1.355e+02 1.464e+02 2.821e+02, threshold=2.709e+02, percent-clipped=1.0 2024-09-25 05:08:15,401 INFO [train.py:1198] (3/4) Epoch 37, batch 3250, loss[loss=0.1638, ctc_loss=0.103, cr_loss=0.3038, over 16990.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1256, cr_loss=0.3428, over 3359740.83 frames. ], batch size: 39, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:08:23,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=669699.3333333334, ans=0.05 2024-09-25 05:08:29,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=669746.0, ans=0.0 2024-09-25 05:08:35,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=669746.0, ans=0.2 2024-09-25 05:08:40,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=669746.0, ans=0.025 2024-09-25 05:08:46,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=669792.6666666666, ans=0.125 2024-09-25 05:09:06,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.38 vs. limit=10.0 2024-09-25 05:09:16,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=669886.0, ans=0.035 2024-09-25 05:09:19,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=669886.0, ans=0.2 2024-09-25 05:09:26,142 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 05:09:33,487 INFO [train.py:1198] (3/4) Epoch 37, batch 3300, loss[loss=0.2063, ctc_loss=0.1345, cr_loss=0.3591, over 17230.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1254, cr_loss=0.3424, over 3362080.35 frames. ], batch size: 50, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:09:36,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=669932.6666666666, ans=0.2 2024-09-25 05:09:43,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=669932.6666666666, ans=0.0 2024-09-25 05:09:43,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2024-09-25 05:09:47,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=669979.3333333334, ans=0.0 2024-09-25 05:09:56,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.02 vs. limit=12.0 2024-09-25 05:10:08,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=670026.0, ans=0.125 2024-09-25 05:10:33,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-09-25 05:10:51,977 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.300e+02 1.355e+02 1.430e+02 2.152e+02, threshold=2.710e+02, percent-clipped=0.0 2024-09-25 05:10:52,002 INFO [train.py:1198] (3/4) Epoch 37, batch 3350, loss[loss=0.2189, ctc_loss=0.1431, cr_loss=0.3788, over 16627.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1256, cr_loss=0.3427, over 3353385.21 frames. ], batch size: 66, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:10:54,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.49 vs. limit=10.0 2024-09-25 05:10:56,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=670166.0, ans=0.1 2024-09-25 05:11:30,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=670259.3333333334, ans=0.0 2024-09-25 05:11:51,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=670306.0, ans=0.125 2024-09-25 05:11:56,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=670352.6666666666, ans=0.125 2024-09-25 05:12:09,889 INFO [train.py:1198] (3/4) Epoch 37, batch 3400, loss[loss=0.1743, ctc_loss=0.1086, cr_loss=0.3283, over 17285.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1249, cr_loss=0.3419, over 3364188.20 frames. ], batch size: 51, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:12:30,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=670446.0, ans=0.2 2024-09-25 05:12:44,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=670492.6666666666, ans=0.0 2024-09-25 05:12:49,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.29 vs. limit=15.0 2024-09-25 05:12:58,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=670539.3333333334, ans=0.05 2024-09-25 05:13:06,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=670539.3333333334, ans=0.0 2024-09-25 05:13:28,361 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.294e+02 1.379e+02 1.549e+02 2.225e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-25 05:13:28,386 INFO [train.py:1198] (3/4) Epoch 37, batch 3450, loss[loss=0.19, ctc_loss=0.1202, cr_loss=0.3492, over 17265.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1243, cr_loss=0.3417, over 3369469.41 frames. ], batch size: 44, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:13:29,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=22.5 2024-09-25 05:13:51,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=670679.3333333334, ans=0.125 2024-09-25 05:14:22,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=670772.6666666666, ans=0.0 2024-09-25 05:14:33,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=670819.3333333334, ans=0.125 2024-09-25 05:14:43,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=670819.3333333334, ans=0.025 2024-09-25 05:14:49,369 INFO [train.py:1198] (3/4) Epoch 37, batch 3500, loss[loss=0.1607, ctc_loss=0.1019, cr_loss=0.2939, over 16276.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1238, cr_loss=0.3407, over 3365857.46 frames. ], batch size: 36, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:14:51,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=670866.0, ans=0.025 2024-09-25 05:15:04,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=12.0 2024-09-25 05:15:09,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=670912.6666666666, ans=0.125 2024-09-25 05:15:25,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=22.5 2024-09-25 05:15:40,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=671006.0, ans=0.125 2024-09-25 05:15:48,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=671006.0, ans=0.0 2024-09-25 05:16:12,281 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.290e+02 1.375e+02 1.489e+02 2.034e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-25 05:16:12,306 INFO [train.py:1198] (3/4) Epoch 37, batch 3550, loss[loss=0.2056, ctc_loss=0.1316, cr_loss=0.3699, over 16995.00 frames. ], tot_loss[loss=0.1925, ctc_loss=0.1243, cr_loss=0.341, over 3362656.25 frames. ], batch size: 53, lr: 3.19e-03, grad_scale: 16.0 2024-09-25 05:16:12,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=671099.3333333334, ans=0.0 2024-09-25 05:16:20,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=671099.3333333334, ans=0.04949747468305833 2024-09-25 05:16:28,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=671146.0, ans=15.0 2024-09-25 05:17:08,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=671239.3333333334, ans=0.125 2024-09-25 05:17:16,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=671286.0, ans=0.0 2024-09-25 05:17:23,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=671286.0, ans=0.125 2024-09-25 05:17:30,029 INFO [train.py:1198] (3/4) Epoch 37, batch 3600, loss[loss=0.227, ctc_loss=0.1449, cr_loss=0.4105, over 16458.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1236, cr_loss=0.3398, over 3365746.44 frames. ], batch size: 66, lr: 3.19e-03, grad_scale: 32.0 2024-09-25 05:17:35,338 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2024-09-25 05:17:56,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=671379.3333333334, ans=0.125 2024-09-25 05:18:01,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=671426.0, ans=0.0 2024-09-25 05:18:10,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.09 vs. limit=15.0 2024-09-25 05:18:20,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=671472.6666666666, ans=0.0 2024-09-25 05:18:50,877 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.287e+02 1.354e+02 1.480e+02 1.820e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-25 05:18:50,903 INFO [train.py:1198] (3/4) Epoch 37, batch 3650, loss[loss=0.2034, ctc_loss=0.1318, cr_loss=0.3583, over 15993.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1236, cr_loss=0.339, over 3361921.40 frames. ], batch size: 74, lr: 3.19e-03, grad_scale: 32.0 2024-09-25 05:19:06,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=671612.6666666666, ans=0.2 2024-09-25 05:19:25,550 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2024-09-25 05:19:26,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=671659.3333333334, ans=0.0 2024-09-25 05:19:31,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=671659.3333333334, ans=0.125 2024-09-25 05:19:37,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=671706.0, ans=0.125 2024-09-25 05:19:37,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2024-09-25 05:19:52,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=671752.6666666666, ans=0.125 2024-09-25 05:19:58,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=671752.6666666666, ans=0.125 2024-09-25 05:20:02,414 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2024-09-25 05:20:02,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.46 vs. limit=15.0 2024-09-25 05:20:09,452 INFO [train.py:1198] (3/4) Epoch 37, batch 3700, loss[loss=0.1934, ctc_loss=0.1263, cr_loss=0.3353, over 17011.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1241, cr_loss=0.3396, over 3351356.08 frames. ], batch size: 51, lr: 3.19e-03, grad_scale: 32.0 2024-09-25 05:20:30,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=671846.0, ans=0.125 2024-09-25 05:20:32,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.32 vs. limit=15.0 2024-09-25 05:21:13,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=671986.0, ans=0.125 2024-09-25 05:21:24,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=671986.0, ans=0.2 2024-09-25 05:21:27,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=672032.6666666666, ans=0.125 2024-09-25 05:21:29,092 INFO [train.py:1198] (3/4) Epoch 37, batch 3750, loss[loss=0.2159, ctc_loss=0.1387, cr_loss=0.3861, over 17178.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1247, cr_loss=0.3414, over 3345708.83 frames. ], batch size: 55, lr: 3.19e-03, grad_scale: 16.0 2024-09-25 05:21:30,592 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.295e+02 1.401e+02 1.510e+02 2.261e+02, threshold=2.801e+02, percent-clipped=0.0 2024-09-25 05:22:23,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=672172.6666666666, ans=0.2 2024-09-25 05:22:24,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=672172.6666666666, ans=0.125 2024-09-25 05:22:30,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=672219.3333333334, ans=0.125 2024-09-25 05:22:34,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=672219.3333333334, ans=0.0 2024-09-25 05:22:47,833 INFO [train.py:1198] (3/4) Epoch 37, batch 3800, loss[loss=0.2329, ctc_loss=0.1554, cr_loss=0.3873, over 16599.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1243, cr_loss=0.34, over 3342802.62 frames. ], batch size: 66, lr: 3.19e-03, grad_scale: 16.0 2024-09-25 05:22:56,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=672266.0, ans=0.05 2024-09-25 05:22:59,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=672266.0, ans=0.125 2024-09-25 05:23:04,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=672312.6666666666, ans=0.1 2024-09-25 05:23:08,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=672312.6666666666, ans=0.125 2024-09-25 05:23:13,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=672312.6666666666, ans=0.125 2024-09-25 05:23:27,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=672359.3333333334, ans=0.1 2024-09-25 05:23:38,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=672406.0, ans=0.125 2024-09-25 05:24:07,330 INFO [train.py:1198] (3/4) Epoch 37, batch 3850, loss[loss=0.2265, ctc_loss=0.1491, cr_loss=0.3871, over 14970.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1247, cr_loss=0.3401, over 3285435.36 frames. ], batch size: 89, lr: 3.19e-03, grad_scale: 16.0 2024-09-25 05:24:08,856 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.255e+02 1.356e+02 1.479e+02 3.474e+02, threshold=2.713e+02, percent-clipped=1.0 2024-09-25 05:24:24,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=672546.0, ans=0.0 2024-09-25 05:24:35,881 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.15 vs. limit=10.0 2024-09-25 05:24:46,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=672592.6666666666, ans=0.0 2024-09-25 05:24:58,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=672639.3333333334, ans=0.125 2024-09-25 05:25:02,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=672639.3333333334, ans=0.1 2024-09-25 05:25:05,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=672639.3333333334, ans=0.125 2024-09-25 05:25:13,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.41 vs. limit=10.0 2024-09-25 05:26:04,405 INFO [train.py:1198] (3/4) Epoch 38, batch 0, loss[loss=0.157, ctc_loss=0.09718, cr_loss=0.2989, over 16238.00 frames. ], tot_loss[loss=0.157, ctc_loss=0.09718, cr_loss=0.2989, over 16238.00 frames. ], batch size: 36, lr: 3.15e-03, grad_scale: 32.0 2024-09-25 05:26:04,406 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 05:26:20,206 INFO [train.py:1230] (3/4) Epoch 38, validation: loss=0.03515, ctc_loss=0.03515, cr_loss=9.44e-15, over 944034.00 frames. 2024-09-25 05:26:20,207 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 05:26:24,600 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.28 vs. limit=15.0 2024-09-25 05:26:25,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=672714.0, ans=0.2 2024-09-25 05:26:54,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=672807.3333333334, ans=0.015 2024-09-25 05:26:54,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-09-25 05:26:56,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2024-09-25 05:27:15,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=672854.0, ans=0.2 2024-09-25 05:27:27,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=672900.6666666666, ans=0.125 2024-09-25 05:27:37,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=672900.6666666666, ans=0.0 2024-09-25 05:27:40,678 INFO [train.py:1198] (3/4) Epoch 38, batch 50, loss[loss=0.1629, ctc_loss=0.1045, cr_loss=0.292, over 16958.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1209, cr_loss=0.3327, over 746521.96 frames. ], batch size: 42, lr: 3.15e-03, grad_scale: 16.0 2024-09-25 05:27:50,498 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.374e+02 1.537e+02 1.726e+02 2.147e+02, threshold=3.075e+02, percent-clipped=0.0 2024-09-25 05:28:20,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=673040.6666666666, ans=0.125 2024-09-25 05:28:28,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=673040.6666666666, ans=0.0 2024-09-25 05:29:03,640 INFO [train.py:1198] (3/4) Epoch 38, batch 100, loss[loss=0.2049, ctc_loss=0.1339, cr_loss=0.355, over 17327.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1232, cr_loss=0.3388, over 1335495.13 frames. ], batch size: 51, lr: 3.15e-03, grad_scale: 8.0 2024-09-25 05:29:21,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=673227.3333333334, ans=0.125 2024-09-25 05:29:24,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=673227.3333333334, ans=0.2 2024-09-25 05:29:47,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=673274.0, ans=0.0 2024-09-25 05:29:53,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=673274.0, ans=0.125 2024-09-25 05:30:21,744 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 05:30:24,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=673367.3333333334, ans=0.025 2024-09-25 05:30:31,019 INFO [train.py:1198] (3/4) Epoch 38, batch 150, loss[loss=0.2305, ctc_loss=0.1488, cr_loss=0.4088, over 15143.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1233, cr_loss=0.3384, over 1784560.14 frames. ], batch size: 89, lr: 3.15e-03, grad_scale: 8.0 2024-09-25 05:30:31,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=673414.0, ans=0.125 2024-09-25 05:30:32,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=673414.0, ans=15.0 2024-09-25 05:30:42,270 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.262e+02 1.329e+02 1.427e+02 1.998e+02, threshold=2.657e+02, percent-clipped=0.0 2024-09-25 05:31:16,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=673507.3333333334, ans=0.2 2024-09-25 05:31:38,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=673600.6666666666, ans=0.125 2024-09-25 05:31:51,209 INFO [train.py:1198] (3/4) Epoch 38, batch 200, loss[loss=0.2452, ctc_loss=0.1587, cr_loss=0.4323, over 16984.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.1235, cr_loss=0.3394, over 2135477.52 frames. ], batch size: 53, lr: 3.15e-03, grad_scale: 8.0 2024-09-25 05:32:34,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=673740.6666666666, ans=0.125 2024-09-25 05:32:37,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=673787.3333333334, ans=0.2 2024-09-25 05:32:42,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=673787.3333333334, ans=0.125 2024-09-25 05:32:44,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673787.3333333334, ans=0.1 2024-09-25 05:33:13,726 INFO [train.py:1198] (3/4) Epoch 38, batch 250, loss[loss=0.1766, ctc_loss=0.1111, cr_loss=0.3279, over 17250.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.123, cr_loss=0.3387, over 2417189.46 frames. ], batch size: 44, lr: 3.15e-03, grad_scale: 8.0 2024-09-25 05:33:17,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=673880.6666666666, ans=0.125 2024-09-25 05:33:24,710 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.262e+02 1.325e+02 1.385e+02 3.377e+02, threshold=2.651e+02, percent-clipped=1.0 2024-09-25 05:33:37,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=673927.3333333334, ans=0.125 2024-09-25 05:33:40,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=673927.3333333334, ans=0.125 2024-09-25 05:34:01,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=674020.6666666666, ans=0.1 2024-09-25 05:34:10,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=674020.6666666666, ans=0.125 2024-09-25 05:34:12,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=674020.6666666666, ans=0.0 2024-09-25 05:34:28,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=674067.3333333334, ans=0.2 2024-09-25 05:34:35,577 INFO [train.py:1198] (3/4) Epoch 38, batch 300, loss[loss=0.2115, ctc_loss=0.1361, cr_loss=0.3773, over 17195.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1236, cr_loss=0.3396, over 2626793.54 frames. ], batch size: 55, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:34:56,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=674160.6666666666, ans=0.0 2024-09-25 05:35:13,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=674207.3333333334, ans=0.0 2024-09-25 05:35:42,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=674254.0, ans=0.1 2024-09-25 05:35:42,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=674254.0, ans=0.0 2024-09-25 05:35:44,520 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.84 vs. limit=15.0 2024-09-25 05:35:56,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=674300.6666666666, ans=0.2 2024-09-25 05:36:01,137 INFO [train.py:1198] (3/4) Epoch 38, batch 350, loss[loss=0.1758, ctc_loss=0.1122, cr_loss=0.3178, over 17104.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1229, cr_loss=0.3384, over 2785493.36 frames. ], batch size: 49, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:36:12,222 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.293e+02 1.371e+02 1.501e+02 2.181e+02, threshold=2.742e+02, percent-clipped=0.0 2024-09-25 05:36:23,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=674394.0, ans=0.0 2024-09-25 05:36:33,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=22.5 2024-09-25 05:36:36,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=674440.6666666666, ans=0.0 2024-09-25 05:36:49,506 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.65 vs. limit=12.0 2024-09-25 05:36:56,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=674487.3333333334, ans=0.125 2024-09-25 05:37:20,499 INFO [train.py:1198] (3/4) Epoch 38, batch 400, loss[loss=0.1881, ctc_loss=0.1208, cr_loss=0.3362, over 17159.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1233, cr_loss=0.3394, over 2916320.15 frames. ], batch size: 45, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:37:35,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=674627.3333333334, ans=0.125 2024-09-25 05:38:17,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=674720.6666666666, ans=0.1 2024-09-25 05:38:23,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=674720.6666666666, ans=0.125 2024-09-25 05:38:26,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=674767.3333333334, ans=0.1 2024-09-25 05:38:41,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=674814.0, ans=0.2 2024-09-25 05:38:42,901 INFO [train.py:1198] (3/4) Epoch 38, batch 450, loss[loss=0.1947, ctc_loss=0.1268, cr_loss=0.3396, over 17191.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1229, cr_loss=0.3391, over 3022249.87 frames. ], batch size: 55, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:38:46,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=674814.0, ans=0.2 2024-09-25 05:38:55,495 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.292e+02 1.363e+02 1.447e+02 2.119e+02, threshold=2.727e+02, percent-clipped=0.0 2024-09-25 05:39:15,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=674907.3333333334, ans=0.125 2024-09-25 05:39:38,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=674954.0, ans=0.2 2024-09-25 05:39:38,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=674954.0, ans=0.2 2024-09-25 05:39:43,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=674954.0, ans=0.125 2024-09-25 05:39:51,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=675000.6666666666, ans=0.0 2024-09-25 05:40:04,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=675000.6666666666, ans=0.125 2024-09-25 05:40:11,291 INFO [train.py:1198] (3/4) Epoch 38, batch 500, loss[loss=0.1914, ctc_loss=0.1262, cr_loss=0.3263, over 16777.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1232, cr_loss=0.3399, over 3103804.80 frames. ], batch size: 61, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:40:26,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=22.5 2024-09-25 05:40:46,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675140.6666666666, ans=0.1 2024-09-25 05:40:51,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=675140.6666666666, ans=0.125 2024-09-25 05:40:52,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=675140.6666666666, ans=0.0 2024-09-25 05:41:03,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=675187.3333333334, ans=0.125 2024-09-25 05:41:13,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=675234.0, ans=0.0 2024-09-25 05:41:15,736 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=22.5 2024-09-25 05:41:27,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=675234.0, ans=10.0 2024-09-25 05:41:30,506 INFO [train.py:1198] (3/4) Epoch 38, batch 550, loss[loss=0.1574, ctc_loss=0.0982, cr_loss=0.2959, over 17056.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1247, cr_loss=0.3426, over 3156435.46 frames. ], batch size: 39, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:41:32,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2024-09-25 05:41:38,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=675280.6666666666, ans=0.1 2024-09-25 05:41:43,253 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.284e+02 1.364e+02 1.434e+02 1.794e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-25 05:41:53,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=675327.3333333334, ans=0.0 2024-09-25 05:42:26,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=675420.6666666666, ans=0.0 2024-09-25 05:42:27,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=22.5 2024-09-25 05:42:43,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=675467.3333333334, ans=0.125 2024-09-25 05:42:50,178 INFO [train.py:1198] (3/4) Epoch 38, batch 600, loss[loss=0.1821, ctc_loss=0.1191, cr_loss=0.315, over 17096.00 frames. ], tot_loss[loss=0.1925, ctc_loss=0.1243, cr_loss=0.3411, over 3208208.60 frames. ], batch size: 49, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:43:21,300 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.93 vs. limit=15.0 2024-09-25 05:43:49,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675654.0, ans=0.1 2024-09-25 05:43:53,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=675654.0, ans=0.2 2024-09-25 05:44:13,111 INFO [train.py:1198] (3/4) Epoch 38, batch 650, loss[loss=0.2185, ctc_loss=0.1415, cr_loss=0.3853, over 17219.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1245, cr_loss=0.3415, over 3246824.80 frames. ], batch size: 55, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:44:23,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=675747.3333333334, ans=0.0 2024-09-25 05:44:28,505 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.283e+02 1.368e+02 1.490e+02 2.037e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 05:44:29,414 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.23 vs. limit=22.5 2024-09-25 05:44:49,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=675794.0, ans=0.125 2024-09-25 05:44:56,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=12.0 2024-09-25 05:44:59,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=22.5 2024-09-25 05:45:04,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=675840.6666666666, ans=0.2 2024-09-25 05:45:15,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=675887.3333333334, ans=0.2 2024-09-25 05:45:15,701 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=22.5 2024-09-25 05:45:18,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=675887.3333333334, ans=0.125 2024-09-25 05:45:40,417 INFO [train.py:1198] (3/4) Epoch 38, batch 700, loss[loss=0.1976, ctc_loss=0.1274, cr_loss=0.3513, over 17303.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1239, cr_loss=0.3408, over 3274578.69 frames. ], batch size: 51, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:45:47,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=675980.6666666666, ans=0.2 2024-09-25 05:45:58,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=676027.3333333334, ans=0.025 2024-09-25 05:46:03,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=676027.3333333334, ans=0.5 2024-09-25 05:46:12,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=676074.0, ans=0.125 2024-09-25 05:46:19,950 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.25 vs. limit=15.0 2024-09-25 05:46:37,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.74 vs. limit=15.0 2024-09-25 05:46:44,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=676167.3333333334, ans=0.125 2024-09-25 05:47:00,010 INFO [train.py:1198] (3/4) Epoch 38, batch 750, loss[loss=0.1893, ctc_loss=0.1217, cr_loss=0.3382, over 17062.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1243, cr_loss=0.342, over 3304569.35 frames. ], batch size: 46, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:47:08,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=676214.0, ans=0.1 2024-09-25 05:47:12,450 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.277e+02 1.363e+02 1.416e+02 2.105e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-25 05:47:22,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=676260.6666666666, ans=0.0 2024-09-25 05:47:25,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.27 vs. limit=10.0 2024-09-25 05:47:31,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=676307.3333333334, ans=0.125 2024-09-25 05:47:41,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=676307.3333333334, ans=0.2 2024-09-25 05:47:42,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=676307.3333333334, ans=0.125 2024-09-25 05:48:06,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=676400.6666666666, ans=0.125 2024-09-25 05:48:06,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.10 vs. limit=6.0 2024-09-25 05:48:21,652 INFO [train.py:1198] (3/4) Epoch 38, batch 800, loss[loss=0.1786, ctc_loss=0.1135, cr_loss=0.3252, over 17177.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1239, cr_loss=0.3412, over 3323540.04 frames. ], batch size: 41, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:48:22,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=15.0 2024-09-25 05:48:47,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=676494.0, ans=0.0 2024-09-25 05:48:47,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=676494.0, ans=0.0 2024-09-25 05:48:55,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=676540.6666666666, ans=0.0 2024-09-25 05:49:34,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=676634.0, ans=0.0 2024-09-25 05:49:39,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-09-25 05:49:43,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=676634.0, ans=0.125 2024-09-25 05:49:49,054 INFO [train.py:1198] (3/4) Epoch 38, batch 850, loss[loss=0.2091, ctc_loss=0.1388, cr_loss=0.3515, over 16742.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1244, cr_loss=0.3418, over 3335845.83 frames. ], batch size: 61, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:49:54,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=676680.6666666666, ans=0.0 2024-09-25 05:50:00,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=676680.6666666666, ans=0.125 2024-09-25 05:50:01,637 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.281e+02 1.360e+02 1.434e+02 2.186e+02, threshold=2.720e+02, percent-clipped=0.0 2024-09-25 05:50:06,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=676727.3333333334, ans=0.125 2024-09-25 05:50:17,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=676727.3333333334, ans=0.2 2024-09-25 05:50:21,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=676774.0, ans=0.125 2024-09-25 05:50:30,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=676774.0, ans=0.025 2024-09-25 05:50:38,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=676820.6666666666, ans=0.2 2024-09-25 05:50:38,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=676820.6666666666, ans=0.125 2024-09-25 05:51:08,596 INFO [train.py:1198] (3/4) Epoch 38, batch 900, loss[loss=0.1765, ctc_loss=0.1143, cr_loss=0.3111, over 17300.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1243, cr_loss=0.3417, over 3346035.33 frames. ], batch size: 49, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:52:03,720 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 05:52:08,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=677054.0, ans=0.125 2024-09-25 05:52:29,103 INFO [train.py:1198] (3/4) Epoch 38, batch 950, loss[loss=0.2146, ctc_loss=0.1444, cr_loss=0.3512, over 15191.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1246, cr_loss=0.3417, over 3343335.31 frames. ], batch size: 89, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:52:32,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=677147.3333333334, ans=0.125 2024-09-25 05:52:35,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=677147.3333333334, ans=0.125 2024-09-25 05:52:41,915 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.327e+02 1.410e+02 1.537e+02 3.330e+02, threshold=2.819e+02, percent-clipped=2.0 2024-09-25 05:52:47,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=677194.0, ans=0.1 2024-09-25 05:53:11,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-25 05:53:20,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=677287.3333333334, ans=0.0 2024-09-25 05:53:41,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=677334.0, ans=0.125 2024-09-25 05:53:52,326 INFO [train.py:1198] (3/4) Epoch 38, batch 1000, loss[loss=0.1942, ctc_loss=0.1249, cr_loss=0.3468, over 16773.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1245, cr_loss=0.3415, over 3343862.67 frames. ], batch size: 61, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:54:24,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-25 05:54:37,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=677474.0, ans=0.125 2024-09-25 05:55:00,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=677520.6666666666, ans=15.0 2024-09-25 05:55:17,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=677567.3333333334, ans=0.125 2024-09-25 05:55:19,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=677614.0, ans=0.125 2024-09-25 05:55:20,560 INFO [train.py:1198] (3/4) Epoch 38, batch 1050, loss[loss=0.1853, ctc_loss=0.1201, cr_loss=0.3257, over 17100.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1248, cr_loss=0.3423, over 3350574.74 frames. ], batch size: 49, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:55:20,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=677614.0, ans=0.2 2024-09-25 05:55:21,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.16 vs. limit=15.0 2024-09-25 05:55:27,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=677614.0, ans=0.125 2024-09-25 05:55:30,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=677614.0, ans=0.0 2024-09-25 05:55:32,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=677614.0, ans=0.0 2024-09-25 05:55:33,504 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.267e+02 1.356e+02 1.451e+02 1.928e+02, threshold=2.711e+02, percent-clipped=0.0 2024-09-25 05:55:48,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=677660.6666666666, ans=0.02 2024-09-25 05:56:13,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=677754.0, ans=0.125 2024-09-25 05:56:18,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=677754.0, ans=0.125 2024-09-25 05:56:40,345 INFO [train.py:1198] (3/4) Epoch 38, batch 1100, loss[loss=0.1599, ctc_loss=0.101, cr_loss=0.2945, over 16275.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1236, cr_loss=0.3397, over 3346046.89 frames. ], batch size: 36, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:57:37,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2024-09-25 05:57:52,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=678034.0, ans=0.0 2024-09-25 05:58:02,653 INFO [train.py:1198] (3/4) Epoch 38, batch 1150, loss[loss=0.2146, ctc_loss=0.1398, cr_loss=0.3738, over 17023.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1234, cr_loss=0.3386, over 3349340.08 frames. ], batch size: 51, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:58:15,128 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.254e+02 1.322e+02 1.438e+02 2.414e+02, threshold=2.644e+02, percent-clipped=0.0 2024-09-25 05:58:21,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=678127.3333333334, ans=0.09899494936611666 2024-09-25 05:58:28,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=678127.3333333334, ans=0.125 2024-09-25 05:58:47,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678174.0, ans=0.1 2024-09-25 05:58:52,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=678220.6666666666, ans=0.125 2024-09-25 05:59:21,094 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-09-25 05:59:25,216 INFO [train.py:1198] (3/4) Epoch 38, batch 1200, loss[loss=0.1885, ctc_loss=0.1206, cr_loss=0.3397, over 17335.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1243, cr_loss=0.3403, over 3332771.12 frames. ], batch size: 51, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 05:59:30,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=678314.0, ans=0.2 2024-09-25 05:59:57,961 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:00:09,161 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:00:25,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=678454.0, ans=0.0 2024-09-25 06:00:26,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=678454.0, ans=0.125 2024-09-25 06:00:34,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=678500.6666666666, ans=0.125 2024-09-25 06:00:36,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=678500.6666666666, ans=0.1 2024-09-25 06:00:50,408 INFO [train.py:1198] (3/4) Epoch 38, batch 1250, loss[loss=0.1841, ctc_loss=0.119, cr_loss=0.3257, over 16955.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1247, cr_loss=0.3414, over 3348528.60 frames. ], batch size: 58, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:01:04,690 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.283e+02 1.378e+02 1.489e+02 1.932e+02, threshold=2.757e+02, percent-clipped=0.0 2024-09-25 06:01:17,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=678594.0, ans=0.125 2024-09-25 06:01:21,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=678640.6666666666, ans=0.125 2024-09-25 06:01:22,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678640.6666666666, ans=0.1 2024-09-25 06:01:40,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=678687.3333333334, ans=0.05 2024-09-25 06:02:03,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=678734.0, ans=0.125 2024-09-25 06:02:08,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=678734.0, ans=0.125 2024-09-25 06:02:08,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=22.5 2024-09-25 06:02:11,136 INFO [train.py:1198] (3/4) Epoch 38, batch 1300, loss[loss=0.1869, ctc_loss=0.1186, cr_loss=0.3417, over 17303.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1232, cr_loss=0.339, over 3355851.93 frames. ], batch size: 46, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:02:13,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=678780.6666666666, ans=0.0 2024-09-25 06:02:21,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=678780.6666666666, ans=0.07 2024-09-25 06:02:22,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=678780.6666666666, ans=0.125 2024-09-25 06:02:53,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=678874.0, ans=0.5 2024-09-25 06:03:11,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=678920.6666666666, ans=0.0 2024-09-25 06:03:27,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2024-09-25 06:03:33,486 INFO [train.py:1198] (3/4) Epoch 38, batch 1350, loss[loss=0.1919, ctc_loss=0.1209, cr_loss=0.3553, over 17294.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1235, cr_loss=0.3405, over 3358350.01 frames. ], batch size: 46, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:03:47,755 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.262e+02 1.346e+02 1.430e+02 2.071e+02, threshold=2.693e+02, percent-clipped=0.0 2024-09-25 06:04:02,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=679060.6666666666, ans=0.0 2024-09-25 06:04:24,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=679154.0, ans=0.2 2024-09-25 06:04:34,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=679154.0, ans=0.125 2024-09-25 06:05:01,174 INFO [train.py:1198] (3/4) Epoch 38, batch 1400, loss[loss=0.2165, ctc_loss=0.1417, cr_loss=0.374, over 16758.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1237, cr_loss=0.3413, over 3368343.09 frames. ], batch size: 61, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:05:05,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2024-09-25 06:05:08,094 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=22.5 2024-09-25 06:05:36,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=679340.6666666666, ans=0.05 2024-09-25 06:05:42,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=679340.6666666666, ans=0.0 2024-09-25 06:06:00,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=679387.3333333334, ans=10.0 2024-09-25 06:06:13,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=679434.0, ans=0.0 2024-09-25 06:06:18,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=679434.0, ans=0.0 2024-09-25 06:06:21,083 INFO [train.py:1198] (3/4) Epoch 38, batch 1450, loss[loss=0.1677, ctc_loss=0.106, cr_loss=0.3085, over 17296.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1234, cr_loss=0.3408, over 3368080.86 frames. ], batch size: 49, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:06:27,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=679480.6666666666, ans=0.2 2024-09-25 06:06:29,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=679480.6666666666, ans=0.125 2024-09-25 06:06:30,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=679480.6666666666, ans=0.0 2024-09-25 06:06:35,619 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.250e+02 1.326e+02 1.407e+02 2.354e+02, threshold=2.651e+02, percent-clipped=0.0 2024-09-25 06:06:37,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=679527.3333333334, ans=0.05 2024-09-25 06:07:01,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=679574.0, ans=0.125 2024-09-25 06:07:06,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=679574.0, ans=0.2 2024-09-25 06:07:09,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=679620.6666666666, ans=0.125 2024-09-25 06:07:12,382 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:07:20,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=679620.6666666666, ans=0.2 2024-09-25 06:07:34,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=679667.3333333334, ans=0.025 2024-09-25 06:07:36,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=679667.3333333334, ans=0.0 2024-09-25 06:07:41,101 INFO [train.py:1198] (3/4) Epoch 38, batch 1500, loss[loss=0.1953, ctc_loss=0.1283, cr_loss=0.3348, over 17226.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.1233, cr_loss=0.3403, over 3364416.45 frames. ], batch size: 47, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:07:57,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=679714.0, ans=0.0 2024-09-25 06:07:57,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=679714.0, ans=0.2 2024-09-25 06:08:06,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=679760.6666666666, ans=0.125 2024-09-25 06:08:30,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=679854.0, ans=0.1 2024-09-25 06:08:43,542 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2024-09-25 06:08:52,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=679900.6666666666, ans=0.125 2024-09-25 06:09:05,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2024-09-25 06:09:06,159 INFO [train.py:1198] (3/4) Epoch 38, batch 1550, loss[loss=0.1637, ctc_loss=0.1028, cr_loss=0.3046, over 17084.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1225, cr_loss=0.3388, over 3368730.95 frames. ], batch size: 40, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:09:20,509 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.263e+02 1.343e+02 1.440e+02 2.044e+02, threshold=2.685e+02, percent-clipped=0.0 2024-09-25 06:09:36,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=679994.0, ans=0.125 2024-09-25 06:09:47,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=22.5 2024-09-25 06:09:48,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=680040.6666666666, ans=0.025 2024-09-25 06:09:56,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=680040.6666666666, ans=0.0 2024-09-25 06:10:10,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=680087.3333333334, ans=0.125 2024-09-25 06:10:18,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=680134.0, ans=0.0 2024-09-25 06:10:21,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=680134.0, ans=0.025 2024-09-25 06:10:31,468 INFO [train.py:1198] (3/4) Epoch 38, batch 1600, loss[loss=0.179, ctc_loss=0.1119, cr_loss=0.3355, over 16972.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1236, cr_loss=0.3404, over 3372149.62 frames. ], batch size: 42, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:11:41,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=680367.3333333334, ans=0.125 2024-09-25 06:11:43,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=680367.3333333334, ans=0.025 2024-09-25 06:11:51,473 INFO [train.py:1198] (3/4) Epoch 38, batch 1650, loss[loss=0.1813, ctc_loss=0.1176, cr_loss=0.3189, over 17301.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1227, cr_loss=0.3376, over 3365531.93 frames. ], batch size: 46, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:12:02,829 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:12:05,734 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.273e+02 1.346e+02 1.505e+02 2.146e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-25 06:12:48,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=680554.0, ans=0.025 2024-09-25 06:13:13,539 INFO [train.py:1198] (3/4) Epoch 38, batch 1700, loss[loss=0.1768, ctc_loss=0.113, cr_loss=0.3192, over 17288.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1233, cr_loss=0.3386, over 3351230.09 frames. ], batch size: 46, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:13:32,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=680694.0, ans=0.1 2024-09-25 06:13:44,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680740.6666666666, ans=0.1 2024-09-25 06:13:45,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=680740.6666666666, ans=0.1 2024-09-25 06:13:48,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680740.6666666666, ans=0.1 2024-09-25 06:13:52,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=680740.6666666666, ans=0.125 2024-09-25 06:14:07,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.87 vs. limit=10.0 2024-09-25 06:14:38,378 INFO [train.py:1198] (3/4) Epoch 38, batch 1750, loss[loss=0.161, ctc_loss=0.1001, cr_loss=0.3041, over 17158.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1228, cr_loss=0.3378, over 3363028.49 frames. ], batch size: 41, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:14:54,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=680880.6666666666, ans=0.125 2024-09-25 06:14:55,387 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.278e+02 1.373e+02 1.469e+02 4.120e+02, threshold=2.745e+02, percent-clipped=1.0 2024-09-25 06:14:58,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=680927.3333333334, ans=0.125 2024-09-25 06:15:06,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=680927.3333333334, ans=0.07 2024-09-25 06:16:01,067 INFO [train.py:1198] (3/4) Epoch 38, batch 1800, loss[loss=0.209, ctc_loss=0.1336, cr_loss=0.3769, over 17013.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1237, cr_loss=0.3398, over 3359465.40 frames. ], batch size: 51, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:16:12,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=681114.0, ans=0.2 2024-09-25 06:16:50,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=681254.0, ans=0.0 2024-09-25 06:16:53,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=681254.0, ans=0.1 2024-09-25 06:16:58,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=681254.0, ans=0.125 2024-09-25 06:16:59,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=681254.0, ans=0.125 2024-09-25 06:17:01,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=681254.0, ans=0.125 2024-09-25 06:17:02,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=681254.0, ans=0.1 2024-09-25 06:17:18,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=681300.6666666666, ans=0.0 2024-09-25 06:17:21,941 INFO [train.py:1198] (3/4) Epoch 38, batch 1850, loss[loss=0.2137, ctc_loss=0.141, cr_loss=0.3632, over 17031.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1236, cr_loss=0.3398, over 3365570.95 frames. ], batch size: 52, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:17:35,796 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.77 vs. limit=6.0 2024-09-25 06:17:36,437 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.263e+02 1.331e+02 1.457e+02 2.352e+02, threshold=2.661e+02, percent-clipped=0.0 2024-09-25 06:17:47,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.52 vs. limit=22.5 2024-09-25 06:18:12,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=681487.3333333334, ans=0.125 2024-09-25 06:18:30,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=681534.0, ans=0.125 2024-09-25 06:18:44,348 INFO [train.py:1198] (3/4) Epoch 38, batch 1900, loss[loss=0.1821, ctc_loss=0.1133, cr_loss=0.344, over 17256.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.123, cr_loss=0.3392, over 3371722.76 frames. ], batch size: 44, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:18:46,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=681580.6666666666, ans=0.0 2024-09-25 06:19:07,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=681627.3333333334, ans=0.125 2024-09-25 06:19:34,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=681674.0, ans=0.125 2024-09-25 06:19:38,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.55 vs. limit=6.0 2024-09-25 06:19:40,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=681720.6666666666, ans=0.1 2024-09-25 06:19:57,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=681767.3333333334, ans=0.125 2024-09-25 06:20:11,786 INFO [train.py:1198] (3/4) Epoch 38, batch 1950, loss[loss=0.212, ctc_loss=0.1364, cr_loss=0.3779, over 17051.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1235, cr_loss=0.3402, over 3374042.74 frames. ], batch size: 52, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:20:12,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=681814.0, ans=0.125 2024-09-25 06:20:27,504 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.276e+02 1.376e+02 1.498e+02 2.117e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-25 06:20:31,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=681860.6666666666, ans=0.125 2024-09-25 06:20:37,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=681860.6666666666, ans=0.0 2024-09-25 06:20:40,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=681860.6666666666, ans=0.125 2024-09-25 06:20:47,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=681907.3333333334, ans=0.0 2024-09-25 06:20:53,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=681907.3333333334, ans=0.025 2024-09-25 06:21:04,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=681954.0, ans=0.125 2024-09-25 06:21:18,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=682000.6666666666, ans=0.05 2024-09-25 06:21:20,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=682000.6666666666, ans=0.035 2024-09-25 06:21:25,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=682000.6666666666, ans=0.0 2024-09-25 06:21:31,235 INFO [train.py:1198] (3/4) Epoch 38, batch 2000, loss[loss=0.2168, ctc_loss=0.1447, cr_loss=0.3603, over 17016.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1236, cr_loss=0.3406, over 3374052.88 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:21:44,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=682047.3333333334, ans=0.0 2024-09-25 06:22:10,923 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-09-25 06:22:32,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=682187.3333333334, ans=0.125 2024-09-25 06:22:46,062 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.25 vs. limit=15.0 2024-09-25 06:22:51,389 INFO [train.py:1198] (3/4) Epoch 38, batch 2050, loss[loss=0.178, ctc_loss=0.1146, cr_loss=0.3172, over 17014.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.124, cr_loss=0.3416, over 3371412.92 frames. ], batch size: 44, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:22:51,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=682280.6666666666, ans=0.125 2024-09-25 06:23:04,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=682280.6666666666, ans=15.0 2024-09-25 06:23:09,957 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.278e+02 1.341e+02 1.480e+02 2.182e+02, threshold=2.683e+02, percent-clipped=0.0 2024-09-25 06:23:23,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=682327.3333333334, ans=0.125 2024-09-25 06:23:23,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=22.5 2024-09-25 06:23:34,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=682374.0, ans=0.1 2024-09-25 06:24:08,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=682467.3333333334, ans=0.125 2024-09-25 06:24:16,376 INFO [train.py:1198] (3/4) Epoch 38, batch 2100, loss[loss=0.2179, ctc_loss=0.1394, cr_loss=0.3926, over 16895.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1231, cr_loss=0.3399, over 3372586.19 frames. ], batch size: 58, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:24:26,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=682514.0, ans=0.125 2024-09-25 06:24:30,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=682514.0, ans=0.125 2024-09-25 06:24:50,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2024-09-25 06:24:56,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=682607.3333333334, ans=0.025 2024-09-25 06:25:14,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=682654.0, ans=0.125 2024-09-25 06:25:26,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=682700.6666666666, ans=0.0 2024-09-25 06:25:28,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2024-09-25 06:25:39,871 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:25:41,097 INFO [train.py:1198] (3/4) Epoch 38, batch 2150, loss[loss=0.2014, ctc_loss=0.1304, cr_loss=0.3549, over 17305.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1236, cr_loss=0.3408, over 3369598.70 frames. ], batch size: 46, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:25:51,542 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2024-09-25 06:25:59,160 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.297e+02 1.376e+02 1.525e+02 2.502e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-25 06:26:15,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=682840.6666666666, ans=0.0 2024-09-25 06:26:21,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=682840.6666666666, ans=0.125 2024-09-25 06:26:29,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=682887.3333333334, ans=0.0 2024-09-25 06:26:37,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=682887.3333333334, ans=0.0 2024-09-25 06:27:01,626 INFO [train.py:1198] (3/4) Epoch 38, batch 2200, loss[loss=0.1657, ctc_loss=0.1054, cr_loss=0.3016, over 17021.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1237, cr_loss=0.3408, over 3359180.01 frames. ], batch size: 39, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:27:03,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=682980.6666666666, ans=0.125 2024-09-25 06:27:05,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=682980.6666666666, ans=0.05 2024-09-25 06:27:14,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=682980.6666666666, ans=0.2 2024-09-25 06:27:27,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=683027.3333333334, ans=0.0 2024-09-25 06:27:40,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=683074.0, ans=0.125 2024-09-25 06:28:20,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=683167.3333333334, ans=0.0 2024-09-25 06:28:23,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2024-09-25 06:28:24,721 INFO [train.py:1198] (3/4) Epoch 38, batch 2250, loss[loss=0.2082, ctc_loss=0.1365, cr_loss=0.3584, over 14996.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1245, cr_loss=0.3425, over 3364117.44 frames. ], batch size: 89, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:28:29,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=683214.0, ans=0.2 2024-09-25 06:28:29,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=683214.0, ans=0.125 2024-09-25 06:28:42,347 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.283e+02 1.354e+02 1.470e+02 2.386e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-25 06:28:45,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=683260.6666666666, ans=0.0 2024-09-25 06:28:57,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=683307.3333333334, ans=0.125 2024-09-25 06:29:17,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.00 vs. limit=15.0 2024-09-25 06:29:20,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=683354.0, ans=0.1 2024-09-25 06:29:27,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=683354.0, ans=0.2 2024-09-25 06:29:29,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=683354.0, ans=0.125 2024-09-25 06:29:49,856 INFO [train.py:1198] (3/4) Epoch 38, batch 2300, loss[loss=0.2168, ctc_loss=0.1392, cr_loss=0.388, over 17311.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.1247, cr_loss=0.3431, over 3372460.23 frames. ], batch size: 46, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:30:09,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=683494.0, ans=0.0 2024-09-25 06:30:22,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=683540.6666666666, ans=0.125 2024-09-25 06:30:37,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=683540.6666666666, ans=0.1 2024-09-25 06:30:59,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=683634.0, ans=0.1 2024-09-25 06:31:12,145 INFO [train.py:1198] (3/4) Epoch 38, batch 2350, loss[loss=0.2102, ctc_loss=0.1359, cr_loss=0.3716, over 17028.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1243, cr_loss=0.3421, over 3370202.72 frames. ], batch size: 44, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:31:20,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=683680.6666666666, ans=0.2 2024-09-25 06:31:20,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=683680.6666666666, ans=0.125 2024-09-25 06:31:29,751 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.295e+02 1.360e+02 1.434e+02 2.304e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-25 06:31:52,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=683774.0, ans=0.2 2024-09-25 06:31:52,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=683774.0, ans=0.125 2024-09-25 06:31:58,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=683820.6666666666, ans=0.5 2024-09-25 06:32:31,860 INFO [train.py:1198] (3/4) Epoch 38, batch 2400, loss[loss=0.1833, ctc_loss=0.1164, cr_loss=0.3345, over 17217.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1235, cr_loss=0.3402, over 3363712.06 frames. ], batch size: 47, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:32:46,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=683960.6666666666, ans=0.2 2024-09-25 06:33:04,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684007.3333333334, ans=0.1 2024-09-25 06:33:07,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=684007.3333333334, ans=0.2 2024-09-25 06:33:54,235 INFO [train.py:1198] (3/4) Epoch 38, batch 2450, loss[loss=0.1811, ctc_loss=0.1185, cr_loss=0.3132, over 17317.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1239, cr_loss=0.3404, over 3362432.61 frames. ], batch size: 49, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:34:03,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=684147.3333333334, ans=0.125 2024-09-25 06:34:14,633 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.026e+02 1.306e+02 1.378e+02 1.472e+02 2.938e+02, threshold=2.756e+02, percent-clipped=1.0 2024-09-25 06:34:33,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=684240.6666666666, ans=0.1 2024-09-25 06:35:02,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=684287.3333333334, ans=0.125 2024-09-25 06:35:07,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=684334.0, ans=10.0 2024-09-25 06:35:22,454 INFO [train.py:1198] (3/4) Epoch 38, batch 2500, loss[loss=0.1786, ctc_loss=0.1179, cr_loss=0.3035, over 17086.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1245, cr_loss=0.3413, over 3362750.50 frames. ], batch size: 46, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:35:29,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=684380.6666666666, ans=0.125 2024-09-25 06:35:37,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.83 vs. limit=10.0 2024-09-25 06:35:39,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.83 vs. limit=15.0 2024-09-25 06:35:44,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2024-09-25 06:36:20,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=684520.6666666666, ans=0.125 2024-09-25 06:36:42,216 INFO [train.py:1198] (3/4) Epoch 38, batch 2550, loss[loss=0.2012, ctc_loss=0.1292, cr_loss=0.3599, over 17012.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1241, cr_loss=0.3408, over 3369014.40 frames. ], batch size: 56, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:36:47,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=684614.0, ans=0.0 2024-09-25 06:36:55,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=684614.0, ans=0.0 2024-09-25 06:37:00,107 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.274e+02 1.355e+02 1.436e+02 2.221e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-25 06:37:02,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=684660.6666666666, ans=0.025 2024-09-25 06:37:02,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=684660.6666666666, ans=0.0 2024-09-25 06:37:08,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684660.6666666666, ans=0.1 2024-09-25 06:37:11,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=684660.6666666666, ans=0.125 2024-09-25 06:37:23,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684707.3333333334, ans=0.1 2024-09-25 06:37:31,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684754.0, ans=0.1 2024-09-25 06:37:34,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.03 vs. limit=22.5 2024-09-25 06:37:55,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=684800.6666666666, ans=0.0 2024-09-25 06:37:59,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=684800.6666666666, ans=0.025 2024-09-25 06:38:05,961 INFO [train.py:1198] (3/4) Epoch 38, batch 2600, loss[loss=0.2091, ctc_loss=0.1371, cr_loss=0.3597, over 16926.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.125, cr_loss=0.3424, over 3364359.42 frames. ], batch size: 58, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:38:33,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=684894.0, ans=0.025 2024-09-25 06:38:35,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=22.5 2024-09-25 06:38:48,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=684940.6666666666, ans=0.0 2024-09-25 06:38:54,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=684987.3333333334, ans=0.5 2024-09-25 06:39:03,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684987.3333333334, ans=0.1 2024-09-25 06:39:31,218 INFO [train.py:1198] (3/4) Epoch 38, batch 2650, loss[loss=0.1821, ctc_loss=0.1159, cr_loss=0.331, over 17215.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1243, cr_loss=0.3416, over 3364818.78 frames. ], batch size: 47, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:39:42,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=685080.6666666666, ans=0.125 2024-09-25 06:39:45,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=685127.3333333334, ans=0.125 2024-09-25 06:39:48,718 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.311e+02 1.384e+02 1.483e+02 1.840e+02, threshold=2.769e+02, percent-clipped=0.0 2024-09-25 06:40:00,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2024-09-25 06:40:15,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=685174.0, ans=0.0 2024-09-25 06:40:18,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=685174.0, ans=0.0 2024-09-25 06:40:53,185 INFO [train.py:1198] (3/4) Epoch 38, batch 2700, loss[loss=0.1864, ctc_loss=0.121, cr_loss=0.3269, over 17367.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1248, cr_loss=0.3423, over 3355574.05 frames. ], batch size: 48, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:41:25,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=685407.3333333334, ans=0.0 2024-09-25 06:41:38,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=685407.3333333334, ans=0.125 2024-09-25 06:41:50,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=685454.0, ans=0.125 2024-09-25 06:41:51,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=685454.0, ans=0.0 2024-09-25 06:41:54,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=685454.0, ans=0.125 2024-09-25 06:41:59,191 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:42:12,894 INFO [train.py:1198] (3/4) Epoch 38, batch 2750, loss[loss=0.1703, ctc_loss=0.1074, cr_loss=0.3144, over 17254.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1246, cr_loss=0.3413, over 3351155.07 frames. ], batch size: 42, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:42:32,105 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.260e+02 1.351e+02 1.429e+02 2.193e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-25 06:42:37,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2024-09-25 06:43:04,320 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=22.5 2024-09-25 06:43:27,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=15.0 2024-09-25 06:43:28,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2024-09-25 06:43:35,233 INFO [train.py:1198] (3/4) Epoch 38, batch 2800, loss[loss=0.1565, ctc_loss=0.09722, cr_loss=0.2962, over 17178.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.1242, cr_loss=0.341, over 3352906.65 frames. ], batch size: 41, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:43:35,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=685780.6666666666, ans=0.125 2024-09-25 06:43:46,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=685780.6666666666, ans=0.025 2024-09-25 06:43:48,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=685780.6666666666, ans=0.0 2024-09-25 06:44:00,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=685827.3333333334, ans=0.125 2024-09-25 06:44:23,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=685874.0, ans=15.0 2024-09-25 06:44:36,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=685920.6666666666, ans=0.0 2024-09-25 06:45:03,156 INFO [train.py:1198] (3/4) Epoch 38, batch 2850, loss[loss=0.1606, ctc_loss=0.1036, cr_loss=0.2849, over 17097.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1245, cr_loss=0.3414, over 3342377.88 frames. ], batch size: 40, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:45:05,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=686014.0, ans=0.0 2024-09-25 06:45:12,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=686014.0, ans=0.025 2024-09-25 06:45:19,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=686060.6666666666, ans=0.0 2024-09-25 06:45:22,310 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.271e+02 1.362e+02 1.479e+02 2.279e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-25 06:45:36,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686107.3333333334, ans=0.1 2024-09-25 06:45:56,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.17 vs. limit=15.0 2024-09-25 06:46:10,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=686200.6666666666, ans=0.125 2024-09-25 06:46:12,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=686200.6666666666, ans=0.0 2024-09-25 06:46:23,068 INFO [train.py:1198] (3/4) Epoch 38, batch 2900, loss[loss=0.1967, ctc_loss=0.1261, cr_loss=0.353, over 17081.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1244, cr_loss=0.3413, over 3354520.17 frames. ], batch size: 49, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:46:38,410 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.57 vs. limit=15.0 2024-09-25 06:47:01,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=686340.6666666666, ans=0.2 2024-09-25 06:47:07,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=686340.6666666666, ans=0.2 2024-09-25 06:47:22,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=686387.3333333334, ans=0.125 2024-09-25 06:47:22,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=686387.3333333334, ans=0.1 2024-09-25 06:47:33,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=686434.0, ans=0.025 2024-09-25 06:47:33,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-09-25 06:47:42,603 INFO [train.py:1198] (3/4) Epoch 38, batch 2950, loss[loss=0.202, ctc_loss=0.1308, cr_loss=0.3559, over 16619.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.124, cr_loss=0.34, over 3354688.37 frames. ], batch size: 66, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:47:48,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-09-25 06:48:04,470 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.305e+02 1.376e+02 1.477e+02 2.268e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-25 06:48:09,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=686527.3333333334, ans=0.0 2024-09-25 06:48:09,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=686527.3333333334, ans=0.125 2024-09-25 06:48:10,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=686527.3333333334, ans=0.125 2024-09-25 06:48:31,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=686620.6666666666, ans=0.0 2024-09-25 06:48:42,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=686620.6666666666, ans=0.0 2024-09-25 06:49:01,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=686667.3333333334, ans=0.0 2024-09-25 06:49:01,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=686667.3333333334, ans=0.125 2024-09-25 06:49:06,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.45 vs. limit=22.5 2024-09-25 06:49:07,365 INFO [train.py:1198] (3/4) Epoch 38, batch 3000, loss[loss=0.1671, ctc_loss=0.1071, cr_loss=0.2999, over 17035.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.1235, cr_loss=0.3394, over 3363001.18 frames. ], batch size: 39, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:49:07,365 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 06:49:22,912 INFO [train.py:1230] (3/4) Epoch 38, validation: loss=0.03571, ctc_loss=0.03571, cr_loss=9.665e-15, over 944034.00 frames. 2024-09-25 06:49:22,913 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 06:49:54,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=686807.3333333334, ans=0.04949747468305833 2024-09-25 06:50:33,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=686900.6666666666, ans=0.125 2024-09-25 06:50:36,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=686900.6666666666, ans=0.125 2024-09-25 06:50:44,317 INFO [train.py:1198] (3/4) Epoch 38, batch 3050, loss[loss=0.1926, ctc_loss=0.1241, cr_loss=0.3423, over 16774.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1237, cr_loss=0.3399, over 3369534.12 frames. ], batch size: 61, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:50:52,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=686947.3333333334, ans=0.025 2024-09-25 06:51:04,197 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 1.282e+02 1.358e+02 1.470e+02 1.835e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 06:51:22,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2024-09-25 06:51:24,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=687040.6666666666, ans=0.0 2024-09-25 06:51:27,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=687040.6666666666, ans=0.025 2024-09-25 06:51:40,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=687087.3333333334, ans=0.2 2024-09-25 06:52:01,990 INFO [train.py:1198] (3/4) Epoch 38, batch 3100, loss[loss=0.2163, ctc_loss=0.1384, cr_loss=0.3893, over 16905.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1238, cr_loss=0.3399, over 3368244.71 frames. ], batch size: 58, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 06:52:03,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=687180.6666666666, ans=0.0 2024-09-25 06:52:33,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=687274.0, ans=0.025 2024-09-25 06:52:36,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=687274.0, ans=0.0 2024-09-25 06:52:54,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=687320.6666666666, ans=0.0 2024-09-25 06:52:59,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=15.0 2024-09-25 06:53:12,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=687367.3333333334, ans=0.125 2024-09-25 06:53:20,027 INFO [train.py:1198] (3/4) Epoch 38, batch 3150, loss[loss=0.2057, ctc_loss=0.1348, cr_loss=0.3548, over 17217.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1236, cr_loss=0.3398, over 3363635.98 frames. ], batch size: 55, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 06:53:23,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.54 vs. limit=15.0 2024-09-25 06:53:40,301 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.270e+02 1.356e+02 1.474e+02 1.773e+02, threshold=2.711e+02, percent-clipped=0.0 2024-09-25 06:54:00,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=687507.3333333334, ans=0.1 2024-09-25 06:54:11,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=687554.0, ans=0.125 2024-09-25 06:54:37,790 INFO [train.py:1198] (3/4) Epoch 38, batch 3200, loss[loss=0.2036, ctc_loss=0.1313, cr_loss=0.3614, over 17253.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1241, cr_loss=0.3404, over 3361774.28 frames. ], batch size: 55, lr: 3.11e-03, grad_scale: 32.0 2024-09-25 06:55:10,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=687740.6666666666, ans=0.125 2024-09-25 06:55:10,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-09-25 06:55:17,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.43 vs. limit=15.0 2024-09-25 06:55:44,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.78 vs. limit=15.0 2024-09-25 06:55:56,026 INFO [train.py:1198] (3/4) Epoch 38, batch 3250, loss[loss=0.2079, ctc_loss=0.1341, cr_loss=0.3689, over 17018.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1233, cr_loss=0.3393, over 3366936.85 frames. ], batch size: 44, lr: 3.11e-03, grad_scale: 32.0 2024-09-25 06:56:02,874 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:56:17,752 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.275e+02 1.365e+02 1.473e+02 2.154e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-25 06:56:32,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=687974.0, ans=0.025 2024-09-25 06:56:46,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=688020.6666666666, ans=0.125 2024-09-25 06:57:11,801 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:57:13,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=688067.3333333334, ans=0.0 2024-09-25 06:57:16,215 INFO [train.py:1198] (3/4) Epoch 38, batch 3300, loss[loss=0.1808, ctc_loss=0.1159, cr_loss=0.3242, over 17228.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1234, cr_loss=0.3389, over 3365952.33 frames. ], batch size: 47, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 06:57:41,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=688160.6666666666, ans=0.125 2024-09-25 06:57:41,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=688160.6666666666, ans=0.0 2024-09-25 06:57:48,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=688207.3333333334, ans=0.07 2024-09-25 06:57:52,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=688207.3333333334, ans=0.0 2024-09-25 06:57:59,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=688207.3333333334, ans=0.1 2024-09-25 06:58:03,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=688254.0, ans=0.2 2024-09-25 06:58:18,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=688300.6666666666, ans=0.1 2024-09-25 06:58:20,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=688300.6666666666, ans=0.1 2024-09-25 06:58:31,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=688300.6666666666, ans=0.0 2024-09-25 06:58:33,936 INFO [train.py:1198] (3/4) Epoch 38, batch 3350, loss[loss=0.178, ctc_loss=0.1114, cr_loss=0.333, over 17014.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1226, cr_loss=0.3374, over 3369689.37 frames. ], batch size: 44, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 06:58:34,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=688347.3333333334, ans=0.025 2024-09-25 06:58:43,589 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:58:55,604 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.280e+02 1.359e+02 1.507e+02 2.410e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-25 06:58:55,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=688394.0, ans=0.1 2024-09-25 06:59:05,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=688440.6666666666, ans=0.125 2024-09-25 06:59:18,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=688440.6666666666, ans=0.1 2024-09-25 06:59:38,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=688534.0, ans=0.0 2024-09-25 06:59:56,004 INFO [train.py:1198] (3/4) Epoch 38, batch 3400, loss[loss=0.2241, ctc_loss=0.1474, cr_loss=0.3838, over 16463.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1233, cr_loss=0.3386, over 3370793.57 frames. ], batch size: 66, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:00:02,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=688580.6666666666, ans=0.125 2024-09-25 07:01:07,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=688767.3333333334, ans=0.1 2024-09-25 07:01:15,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.46 vs. limit=15.0 2024-09-25 07:01:16,602 INFO [train.py:1198] (3/4) Epoch 38, batch 3450, loss[loss=0.1861, ctc_loss=0.1202, cr_loss=0.3296, over 17002.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1236, cr_loss=0.3397, over 3371636.04 frames. ], batch size: 51, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:01:29,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.65 vs. limit=6.0 2024-09-25 07:01:40,122 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.283e+02 1.390e+02 1.486e+02 2.473e+02, threshold=2.781e+02, percent-clipped=0.0 2024-09-25 07:01:41,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2024-09-25 07:01:56,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=688907.3333333334, ans=0.125 2024-09-25 07:02:35,142 INFO [train.py:1198] (3/4) Epoch 38, batch 3500, loss[loss=0.2106, ctc_loss=0.1362, cr_loss=0.3717, over 17027.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1235, cr_loss=0.3389, over 3376624.57 frames. ], batch size: 52, lr: 3.11e-03, grad_scale: 8.0 2024-09-25 07:02:43,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=689047.3333333334, ans=0.2 2024-09-25 07:03:03,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=689094.0, ans=0.1 2024-09-25 07:03:30,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.78 vs. limit=6.0 2024-09-25 07:03:39,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=689234.0, ans=0.125 2024-09-25 07:03:53,032 INFO [train.py:1198] (3/4) Epoch 38, batch 3550, loss[loss=0.1775, ctc_loss=0.1142, cr_loss=0.3166, over 17295.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1236, cr_loss=0.3392, over 3368769.02 frames. ], batch size: 49, lr: 3.11e-03, grad_scale: 8.0 2024-09-25 07:03:53,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.50 vs. limit=6.0 2024-09-25 07:04:16,680 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.303e+02 1.383e+02 1.456e+02 2.390e+02, threshold=2.765e+02, percent-clipped=0.0 2024-09-25 07:04:49,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=689420.6666666666, ans=0.125 2024-09-25 07:04:49,981 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2024-09-25 07:05:11,166 INFO [train.py:1198] (3/4) Epoch 38, batch 3600, loss[loss=0.2154, ctc_loss=0.1389, cr_loss=0.3825, over 17292.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.124, cr_loss=0.34, over 3362501.20 frames. ], batch size: 49, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:05:59,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=689654.0, ans=0.125 2024-09-25 07:06:00,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=689654.0, ans=0.0 2024-09-25 07:06:14,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=689700.6666666666, ans=0.125 2024-09-25 07:06:20,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=689700.6666666666, ans=0.0 2024-09-25 07:06:29,867 INFO [train.py:1198] (3/4) Epoch 38, batch 3650, loss[loss=0.2162, ctc_loss=0.1422, cr_loss=0.3702, over 15148.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1239, cr_loss=0.339, over 3363927.27 frames. ], batch size: 89, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:06:43,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=689747.3333333334, ans=0.1 2024-09-25 07:06:55,353 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.248e+02 1.330e+02 1.434e+02 2.019e+02, threshold=2.659e+02, percent-clipped=0.0 2024-09-25 07:07:02,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.85 vs. limit=10.0 2024-09-25 07:07:42,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=689934.0, ans=0.1 2024-09-25 07:07:43,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.37 vs. limit=22.5 2024-09-25 07:07:50,477 INFO [train.py:1198] (3/4) Epoch 38, batch 3700, loss[loss=0.2, ctc_loss=0.1283, cr_loss=0.3582, over 17045.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1247, cr_loss=0.341, over 3352392.35 frames. ], batch size: 56, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:07:57,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=689980.6666666666, ans=0.2 2024-09-25 07:08:35,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=690074.0, ans=0.125 2024-09-25 07:08:44,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=690120.6666666666, ans=0.1 2024-09-25 07:08:44,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=690120.6666666666, ans=0.125 2024-09-25 07:09:10,778 INFO [train.py:1198] (3/4) Epoch 38, batch 3750, loss[loss=0.1676, ctc_loss=0.1044, cr_loss=0.316, over 16957.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1249, cr_loss=0.3409, over 3334171.30 frames. ], batch size: 42, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:09:34,565 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.279e+02 1.362e+02 1.449e+02 2.293e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-25 07:09:37,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=690260.6666666666, ans=0.125 2024-09-25 07:09:46,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=690307.3333333334, ans=0.0 2024-09-25 07:10:04,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2024-09-25 07:10:15,842 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=22.5 2024-09-25 07:10:30,982 INFO [train.py:1198] (3/4) Epoch 38, batch 3800, loss[loss=0.1433, ctc_loss=0.09013, cr_loss=0.2658, over 16743.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.1251, cr_loss=0.3412, over 3323911.54 frames. ], batch size: 37, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:10:46,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=690494.0, ans=0.125 2024-09-25 07:11:13,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=690540.6666666666, ans=0.015 2024-09-25 07:11:19,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-09-25 07:11:23,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.80 vs. limit=10.0 2024-09-25 07:11:24,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=690587.3333333334, ans=0.0 2024-09-25 07:11:51,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=15.0 2024-09-25 07:11:52,447 INFO [train.py:1198] (3/4) Epoch 38, batch 3850, loss[loss=0.1949, ctc_loss=0.1286, cr_loss=0.3316, over 14885.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1266, cr_loss=0.3429, over 3258331.61 frames. ], batch size: 89, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:12:04,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=690680.6666666666, ans=0.125 2024-09-25 07:12:15,186 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.374e+02 1.486e+02 1.640e+02 2.118e+02, threshold=2.972e+02, percent-clipped=0.0 2024-09-25 07:12:20,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=690727.3333333334, ans=0.0 2024-09-25 07:12:21,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=690774.0, ans=0.0 2024-09-25 07:12:24,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=690774.0, ans=0.1 2024-09-25 07:12:29,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=690774.0, ans=0.1 2024-09-25 07:12:33,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=690774.0, ans=0.0 2024-09-25 07:12:36,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=690820.6666666666, ans=0.0 2024-09-25 07:12:38,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=690820.6666666666, ans=0.0 2024-09-25 07:12:39,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=690820.6666666666, ans=0.125 2024-09-25 07:12:46,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=690820.6666666666, ans=0.0 2024-09-25 07:13:50,139 INFO [train.py:1198] (3/4) Epoch 39, batch 0, loss[loss=0.1668, ctc_loss=0.1053, cr_loss=0.3076, over 16950.00 frames. ], tot_loss[loss=0.1668, ctc_loss=0.1053, cr_loss=0.3076, over 16950.00 frames. ], batch size: 42, lr: 3.07e-03, grad_scale: 32.0 2024-09-25 07:13:50,139 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 07:14:01,603 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6089, 3.3124, 4.3891, 4.1597], device='cuda:3') 2024-09-25 07:14:06,120 INFO [train.py:1230] (3/4) Epoch 39, validation: loss=0.03529, ctc_loss=0.03529, cr_loss=1.033e-14, over 944034.00 frames. 2024-09-25 07:14:06,121 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 07:14:25,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=690942.0, ans=0.125 2024-09-25 07:14:33,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2024-09-25 07:14:35,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2024-09-25 07:14:47,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=690988.6666666666, ans=0.125 2024-09-25 07:14:51,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.31 vs. limit=12.0 2024-09-25 07:14:55,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=691035.3333333334, ans=0.0 2024-09-25 07:14:57,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=691035.3333333334, ans=0.07 2024-09-25 07:15:26,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=691082.0, ans=0.125 2024-09-25 07:15:28,967 INFO [train.py:1198] (3/4) Epoch 39, batch 50, loss[loss=0.199, ctc_loss=0.1281, cr_loss=0.3547, over 17305.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1251, cr_loss=0.3446, over 760601.19 frames. ], batch size: 46, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:15:29,733 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2024-09-25 07:15:44,664 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.59 vs. limit=15.0 2024-09-25 07:15:48,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=691175.3333333334, ans=0.1 2024-09-25 07:15:48,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=691175.3333333334, ans=0.025 2024-09-25 07:15:59,582 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.279e+02 1.417e+02 1.630e+02 3.403e+02, threshold=2.834e+02, percent-clipped=1.0 2024-09-25 07:16:52,219 INFO [train.py:1198] (3/4) Epoch 39, batch 100, loss[loss=0.1856, ctc_loss=0.1193, cr_loss=0.3316, over 17292.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.1233, cr_loss=0.3399, over 1338413.93 frames. ], batch size: 49, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:17:39,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=691502.0, ans=0.2 2024-09-25 07:17:42,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=691502.0, ans=0.0 2024-09-25 07:18:12,293 INFO [train.py:1198] (3/4) Epoch 39, batch 150, loss[loss=0.1749, ctc_loss=0.1117, cr_loss=0.3161, over 17094.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1237, cr_loss=0.3397, over 1774784.42 frames. ], batch size: 40, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:18:17,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=691595.3333333334, ans=0.1 2024-09-25 07:18:23,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2024-09-25 07:18:41,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=691642.0, ans=0.1 2024-09-25 07:18:44,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2024-09-25 07:18:45,368 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.277e+02 1.390e+02 1.504e+02 2.454e+02, threshold=2.780e+02, percent-clipped=0.0 2024-09-25 07:18:58,242 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 07:19:31,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=691782.0, ans=0.125 2024-09-25 07:19:40,579 INFO [train.py:1198] (3/4) Epoch 39, batch 200, loss[loss=0.2065, ctc_loss=0.1356, cr_loss=0.3543, over 17008.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1226, cr_loss=0.3389, over 2131159.75 frames. ], batch size: 53, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:20:03,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=691875.3333333334, ans=15.0 2024-09-25 07:20:13,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=691922.0, ans=0.125 2024-09-25 07:20:35,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=691968.6666666666, ans=0.5 2024-09-25 07:20:46,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=692015.3333333334, ans=0.125 2024-09-25 07:20:57,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=692015.3333333334, ans=0.5 2024-09-25 07:21:00,251 INFO [train.py:1198] (3/4) Epoch 39, batch 250, loss[loss=0.1504, ctc_loss=0.0934, cr_loss=0.2851, over 16386.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1229, cr_loss=0.3394, over 2408796.35 frames. ], batch size: 36, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:21:02,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=692062.0, ans=0.125 2024-09-25 07:21:19,768 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 07:21:33,650 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.285e+02 1.349e+02 1.463e+02 2.685e+02, threshold=2.698e+02, percent-clipped=0.0 2024-09-25 07:21:45,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=692155.3333333334, ans=0.04949747468305833 2024-09-25 07:21:48,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=692155.3333333334, ans=0.05 2024-09-25 07:21:49,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=692202.0, ans=0.125 2024-09-25 07:21:56,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=692202.0, ans=0.09899494936611666 2024-09-25 07:22:05,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=692248.6666666666, ans=0.0 2024-09-25 07:22:12,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=692248.6666666666, ans=0.2 2024-09-25 07:22:18,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=692248.6666666666, ans=0.05 2024-09-25 07:22:22,922 INFO [train.py:1198] (3/4) Epoch 39, batch 300, loss[loss=0.2051, ctc_loss=0.1336, cr_loss=0.3573, over 17325.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.123, cr_loss=0.3393, over 2615708.03 frames. ], batch size: 52, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:22:23,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=692295.3333333334, ans=0.125 2024-09-25 07:22:39,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=692342.0, ans=0.1 2024-09-25 07:23:01,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=692388.6666666666, ans=0.2 2024-09-25 07:23:03,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=692388.6666666666, ans=0.125 2024-09-25 07:23:11,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=692435.3333333334, ans=0.125 2024-09-25 07:23:41,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=692482.0, ans=0.0 2024-09-25 07:23:45,873 INFO [train.py:1198] (3/4) Epoch 39, batch 350, loss[loss=0.2115, ctc_loss=0.1366, cr_loss=0.3745, over 17310.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1233, cr_loss=0.3394, over 2781835.19 frames. ], batch size: 51, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:24:04,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=692575.3333333334, ans=0.125 2024-09-25 07:24:21,676 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.269e+02 1.341e+02 1.429e+02 1.987e+02, threshold=2.681e+02, percent-clipped=0.0 2024-09-25 07:24:29,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=692622.0, ans=0.07 2024-09-25 07:24:44,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=692668.6666666666, ans=10.0 2024-09-25 07:24:45,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=692668.6666666666, ans=0.125 2024-09-25 07:24:46,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=15.0 2024-09-25 07:25:10,802 INFO [train.py:1198] (3/4) Epoch 39, batch 400, loss[loss=0.1617, ctc_loss=0.1012, cr_loss=0.3024, over 17117.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1228, cr_loss=0.3386, over 2900823.54 frames. ], batch size: 40, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:26:08,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=692902.0, ans=0.0 2024-09-25 07:26:33,644 INFO [train.py:1198] (3/4) Epoch 39, batch 450, loss[loss=0.2176, ctc_loss=0.1492, cr_loss=0.3419, over 11927.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.1233, cr_loss=0.3399, over 2993969.43 frames. ], batch size: 123, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:26:35,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=692995.3333333334, ans=0.125 2024-09-25 07:26:46,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=692995.3333333334, ans=0.0 2024-09-25 07:26:48,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=693042.0, ans=0.125 2024-09-25 07:26:57,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=693042.0, ans=0.0 2024-09-25 07:27:03,899 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.286e+02 1.365e+02 1.440e+02 1.919e+02, threshold=2.731e+02, percent-clipped=0.0 2024-09-25 07:27:04,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=22.5 2024-09-25 07:27:23,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=693135.3333333334, ans=0.0 2024-09-25 07:27:31,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=693135.3333333334, ans=15.0 2024-09-25 07:27:32,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=693135.3333333334, ans=0.125 2024-09-25 07:27:53,308 INFO [train.py:1198] (3/4) Epoch 39, batch 500, loss[loss=0.2008, ctc_loss=0.1304, cr_loss=0.3521, over 16523.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1233, cr_loss=0.3398, over 3078641.72 frames. ], batch size: 66, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:28:13,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=693275.3333333334, ans=0.0 2024-09-25 07:28:29,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=693322.0, ans=0.125 2024-09-25 07:28:35,433 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.78 vs. limit=10.0 2024-09-25 07:29:03,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=693415.3333333334, ans=0.125 2024-09-25 07:29:21,522 INFO [train.py:1198] (3/4) Epoch 39, batch 550, loss[loss=0.1907, ctc_loss=0.1213, cr_loss=0.3472, over 16749.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.123, cr_loss=0.3385, over 3145898.07 frames. ], batch size: 37, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:29:22,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2024-09-25 07:29:23,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=693462.0, ans=0.125 2024-09-25 07:29:37,348 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2024-09-25 07:29:46,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=693508.6666666666, ans=0.0 2024-09-25 07:29:51,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.72 vs. limit=10.0 2024-09-25 07:29:52,389 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.262e+02 1.347e+02 1.440e+02 2.072e+02, threshold=2.694e+02, percent-clipped=0.0 2024-09-25 07:29:53,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2024-09-25 07:30:10,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=693602.0, ans=0.125 2024-09-25 07:30:42,148 INFO [train.py:1198] (3/4) Epoch 39, batch 600, loss[loss=0.1864, ctc_loss=0.1191, cr_loss=0.3366, over 17024.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1231, cr_loss=0.338, over 3197792.20 frames. ], batch size: 44, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:30:46,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=15.0 2024-09-25 07:31:03,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=693742.0, ans=0.2 2024-09-25 07:31:06,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=693742.0, ans=0.125 2024-09-25 07:31:13,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=693788.6666666666, ans=0.95 2024-09-25 07:31:47,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=693882.0, ans=0.0 2024-09-25 07:32:04,863 INFO [train.py:1198] (3/4) Epoch 39, batch 650, loss[loss=0.1653, ctc_loss=0.1056, cr_loss=0.2981, over 17274.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1226, cr_loss=0.3373, over 3232825.60 frames. ], batch size: 42, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:32:24,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=693975.3333333334, ans=0.0 2024-09-25 07:32:30,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=693975.3333333334, ans=0.125 2024-09-25 07:32:35,162 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.253e+02 1.371e+02 1.476e+02 2.121e+02, threshold=2.742e+02, percent-clipped=0.0 2024-09-25 07:33:08,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=694115.3333333334, ans=0.1 2024-09-25 07:33:20,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=694115.3333333334, ans=0.125 2024-09-25 07:33:24,776 INFO [train.py:1198] (3/4) Epoch 39, batch 700, loss[loss=0.2117, ctc_loss=0.1348, cr_loss=0.3842, over 16988.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1226, cr_loss=0.3374, over 3259035.13 frames. ], batch size: 53, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:33:29,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=694162.0, ans=0.025 2024-09-25 07:33:40,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=694162.0, ans=0.2 2024-09-25 07:33:44,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.85 vs. limit=15.0 2024-09-25 07:34:21,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-09-25 07:34:22,735 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 07:34:25,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=694302.0, ans=0.125 2024-09-25 07:34:48,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=694348.6666666666, ans=0.125 2024-09-25 07:34:52,578 INFO [train.py:1198] (3/4) Epoch 39, batch 750, loss[loss=0.2073, ctc_loss=0.1342, cr_loss=0.3653, over 17107.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1223, cr_loss=0.3372, over 3290180.70 frames. ], batch size: 49, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:35:23,323 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.271e+02 1.368e+02 1.469e+02 1.814e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 07:35:36,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=694488.6666666666, ans=0.125 2024-09-25 07:35:44,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=694535.3333333334, ans=0.2 2024-09-25 07:36:13,110 INFO [train.py:1198] (3/4) Epoch 39, batch 800, loss[loss=0.1563, ctc_loss=0.09738, cr_loss=0.2944, over 17206.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1223, cr_loss=0.3371, over 3304849.70 frames. ], batch size: 41, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:36:13,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=694628.6666666666, ans=0.125 2024-09-25 07:36:15,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694628.6666666666, ans=0.1 2024-09-25 07:36:22,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=694628.6666666666, ans=0.125 2024-09-25 07:36:25,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=694628.6666666666, ans=0.125 2024-09-25 07:36:41,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=694675.3333333334, ans=0.125 2024-09-25 07:36:46,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694722.0, ans=0.1 2024-09-25 07:37:10,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=694768.6666666666, ans=0.125 2024-09-25 07:37:13,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=694768.6666666666, ans=0.1 2024-09-25 07:37:35,919 INFO [train.py:1198] (3/4) Epoch 39, batch 850, loss[loss=0.1943, ctc_loss=0.1244, cr_loss=0.3497, over 17019.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1217, cr_loss=0.3367, over 3327195.97 frames. ], batch size: 44, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:38:06,361 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.258e+02 1.327e+02 1.456e+02 1.924e+02, threshold=2.653e+02, percent-clipped=0.0 2024-09-25 07:38:08,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=694955.3333333334, ans=0.125 2024-09-25 07:38:19,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=694955.3333333334, ans=0.0 2024-09-25 07:38:25,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=695002.0, ans=0.125 2024-09-25 07:38:27,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=695002.0, ans=0.125 2024-09-25 07:38:46,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=695048.6666666666, ans=0.125 2024-09-25 07:39:01,740 INFO [train.py:1198] (3/4) Epoch 39, batch 900, loss[loss=0.1724, ctc_loss=0.1098, cr_loss=0.3128, over 17265.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1232, cr_loss=0.3392, over 3336748.82 frames. ], batch size: 42, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:39:14,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=695095.3333333334, ans=0.5 2024-09-25 07:39:57,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=695235.3333333334, ans=0.0 2024-09-25 07:40:08,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=695282.0, ans=0.05 2024-09-25 07:40:10,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=695282.0, ans=0.125 2024-09-25 07:40:24,435 INFO [train.py:1198] (3/4) Epoch 39, batch 950, loss[loss=0.1672, ctc_loss=0.1046, cr_loss=0.3129, over 17203.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.123, cr_loss=0.339, over 3332828.13 frames. ], batch size: 41, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:40:28,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=695328.6666666666, ans=0.0 2024-09-25 07:40:42,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=695375.3333333334, ans=0.1 2024-09-25 07:40:55,209 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.302e+02 1.382e+02 1.480e+02 1.852e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-25 07:41:25,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=695468.6666666666, ans=0.2 2024-09-25 07:41:39,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=695515.3333333334, ans=0.125 2024-09-25 07:41:47,518 INFO [train.py:1198] (3/4) Epoch 39, batch 1000, loss[loss=0.1846, ctc_loss=0.1195, cr_loss=0.3257, over 17166.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1226, cr_loss=0.3383, over 3342814.84 frames. ], batch size: 48, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:42:01,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2024-09-25 07:42:15,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2024-09-25 07:42:22,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=695655.3333333334, ans=0.125 2024-09-25 07:42:50,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=695748.6666666666, ans=0.025 2024-09-25 07:42:55,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=695748.6666666666, ans=0.125 2024-09-25 07:43:06,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=695795.3333333334, ans=0.125 2024-09-25 07:43:07,521 INFO [train.py:1198] (3/4) Epoch 39, batch 1050, loss[loss=0.1782, ctc_loss=0.113, cr_loss=0.3262, over 17198.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.122, cr_loss=0.3373, over 3346506.04 frames. ], batch size: 41, lr: 3.05e-03, grad_scale: 32.0 2024-09-25 07:43:07,972 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 07:43:15,810 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 07:43:18,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=695795.3333333334, ans=0.125 2024-09-25 07:43:38,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=695842.0, ans=0.0 2024-09-25 07:43:40,129 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.305e+02 1.388e+02 1.496e+02 1.693e+02, threshold=2.775e+02, percent-clipped=0.0 2024-09-25 07:44:03,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=695935.3333333334, ans=0.125 2024-09-25 07:44:15,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=695935.3333333334, ans=0.125 2024-09-25 07:44:22,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=695982.0, ans=0.125 2024-09-25 07:44:31,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=695982.0, ans=0.125 2024-09-25 07:44:33,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=696028.6666666666, ans=0.0 2024-09-25 07:44:34,844 INFO [train.py:1198] (3/4) Epoch 39, batch 1100, loss[loss=0.2012, ctc_loss=0.1306, cr_loss=0.3532, over 17096.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1224, cr_loss=0.3381, over 3356852.74 frames. ], batch size: 40, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:44:37,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.24 vs. limit=15.0 2024-09-25 07:44:58,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=696075.3333333334, ans=0.125 2024-09-25 07:44:58,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=696075.3333333334, ans=0.125 2024-09-25 07:45:15,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2024-09-25 07:45:23,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=696168.6666666666, ans=0.1 2024-09-25 07:45:28,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=696168.6666666666, ans=0.025 2024-09-25 07:45:54,573 INFO [train.py:1198] (3/4) Epoch 39, batch 1150, loss[loss=0.1784, ctc_loss=0.1155, cr_loss=0.3145, over 17096.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1229, cr_loss=0.3388, over 3353413.06 frames. ], batch size: 43, lr: 3.05e-03, grad_scale: 8.0 2024-09-25 07:46:09,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=15.0 2024-09-25 07:46:30,830 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.295e+02 1.353e+02 1.453e+02 1.736e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-25 07:47:16,820 INFO [train.py:1198] (3/4) Epoch 39, batch 1200, loss[loss=0.1864, ctc_loss=0.1199, cr_loss=0.3326, over 17027.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1228, cr_loss=0.3387, over 3358684.05 frames. ], batch size: 44, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:48:21,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=696682.0, ans=0.07 2024-09-25 07:48:39,499 INFO [train.py:1198] (3/4) Epoch 39, batch 1250, loss[loss=0.186, ctc_loss=0.1194, cr_loss=0.3333, over 17105.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1231, cr_loss=0.3395, over 3369492.92 frames. ], batch size: 49, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:49:14,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=12.0 2024-09-25 07:49:17,456 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.275e+02 1.351e+02 1.497e+02 2.020e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-25 07:49:20,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=696822.0, ans=0.125 2024-09-25 07:49:46,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=696915.3333333334, ans=0.125 2024-09-25 07:50:03,596 INFO [train.py:1198] (3/4) Epoch 39, batch 1300, loss[loss=0.2127, ctc_loss=0.1381, cr_loss=0.3729, over 17007.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.123, cr_loss=0.3397, over 3380967.07 frames. ], batch size: 51, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:50:10,368 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 07:50:11,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=696962.0, ans=0.125 2024-09-25 07:51:26,314 INFO [train.py:1198] (3/4) Epoch 39, batch 1350, loss[loss=0.1944, ctc_loss=0.1222, cr_loss=0.3612, over 17220.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1231, cr_loss=0.34, over 3373828.30 frames. ], batch size: 47, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:51:51,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=697242.0, ans=0.0 2024-09-25 07:51:54,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=697242.0, ans=0.2 2024-09-25 07:52:00,576 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.264e+02 1.325e+02 1.430e+02 2.601e+02, threshold=2.650e+02, percent-clipped=0.0 2024-09-25 07:52:13,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=697335.3333333334, ans=0.125 2024-09-25 07:52:20,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=697335.3333333334, ans=0.125 2024-09-25 07:52:24,371 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=15.0 2024-09-25 07:52:41,858 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=15.0 2024-09-25 07:52:47,368 INFO [train.py:1198] (3/4) Epoch 39, batch 1400, loss[loss=0.1961, ctc_loss=0.1228, cr_loss=0.3662, over 17316.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1236, cr_loss=0.3413, over 3363999.30 frames. ], batch size: 51, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:53:02,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=697475.3333333334, ans=0.125 2024-09-25 07:53:25,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.38 vs. limit=15.0 2024-09-25 07:53:36,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.94 vs. limit=6.0 2024-09-25 07:53:44,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=697568.6666666666, ans=0.0 2024-09-25 07:53:51,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=697568.6666666666, ans=10.0 2024-09-25 07:54:09,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=15.0 2024-09-25 07:54:15,113 INFO [train.py:1198] (3/4) Epoch 39, batch 1450, loss[loss=0.1912, ctc_loss=0.1235, cr_loss=0.3388, over 17114.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1236, cr_loss=0.341, over 3362854.52 frames. ], batch size: 40, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:54:17,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=697662.0, ans=0.0 2024-09-25 07:54:31,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=697708.6666666666, ans=0.0 2024-09-25 07:54:39,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=697708.6666666666, ans=0.0 2024-09-25 07:54:48,363 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.159e+02 1.285e+02 1.356e+02 1.493e+02 2.927e+02, threshold=2.712e+02, percent-clipped=2.0 2024-09-25 07:54:48,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=697755.3333333334, ans=0.125 2024-09-25 07:54:56,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=697755.3333333334, ans=0.0 2024-09-25 07:55:28,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=697848.6666666666, ans=0.125 2024-09-25 07:55:34,691 INFO [train.py:1198] (3/4) Epoch 39, batch 1500, loss[loss=0.2016, ctc_loss=0.1327, cr_loss=0.3448, over 16484.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1242, cr_loss=0.342, over 3364572.51 frames. ], batch size: 66, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:56:57,037 INFO [train.py:1198] (3/4) Epoch 39, batch 1550, loss[loss=0.2146, ctc_loss=0.1383, cr_loss=0.3815, over 17088.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1244, cr_loss=0.3426, over 3371449.52 frames. ], batch size: 49, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:56:59,214 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 07:56:59,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=698128.6666666666, ans=0.0 2024-09-25 07:57:27,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=698222.0, ans=0.0 2024-09-25 07:57:30,847 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.291e+02 1.368e+02 1.486e+02 2.492e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 07:57:49,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=698268.6666666666, ans=0.125 2024-09-25 07:57:50,400 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=15.0 2024-09-25 07:58:17,122 INFO [train.py:1198] (3/4) Epoch 39, batch 1600, loss[loss=0.1706, ctc_loss=0.1083, cr_loss=0.3115, over 17048.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1239, cr_loss=0.3419, over 3371510.11 frames. ], batch size: 39, lr: 3.05e-03, grad_scale: 32.0 2024-09-25 07:59:44,297 INFO [train.py:1198] (3/4) Epoch 39, batch 1650, loss[loss=0.2025, ctc_loss=0.1325, cr_loss=0.3496, over 17213.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1235, cr_loss=0.3411, over 3372467.00 frames. ], batch size: 50, lr: 3.05e-03, grad_scale: 32.0 2024-09-25 07:59:47,762 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:00:03,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698642.0, ans=0.1 2024-09-25 08:00:17,964 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.287e+02 1.365e+02 1.434e+02 2.064e+02, threshold=2.730e+02, percent-clipped=0.0 2024-09-25 08:00:48,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=698782.0, ans=0.125 2024-09-25 08:01:04,263 INFO [train.py:1198] (3/4) Epoch 39, batch 1700, loss[loss=0.2064, ctc_loss=0.1335, cr_loss=0.3644, over 16492.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1229, cr_loss=0.3403, over 3380020.51 frames. ], batch size: 66, lr: 3.05e-03, grad_scale: 32.0 2024-09-25 08:01:38,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=698922.0, ans=0.125 2024-09-25 08:01:48,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698922.0, ans=0.1 2024-09-25 08:01:49,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=22.5 2024-09-25 08:01:56,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=698968.6666666666, ans=0.125 2024-09-25 08:02:10,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=699015.3333333334, ans=0.0 2024-09-25 08:02:10,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-09-25 08:02:16,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=699015.3333333334, ans=0.5 2024-09-25 08:02:25,988 INFO [train.py:1198] (3/4) Epoch 39, batch 1750, loss[loss=0.1998, ctc_loss=0.1284, cr_loss=0.3571, over 17304.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1231, cr_loss=0.3403, over 3377024.29 frames. ], batch size: 49, lr: 3.05e-03, grad_scale: 32.0 2024-09-25 08:02:35,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=699062.0, ans=0.125 2024-09-25 08:02:47,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=699108.6666666666, ans=0.2 2024-09-25 08:02:53,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=699108.6666666666, ans=0.2 2024-09-25 08:03:01,042 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.281e+02 1.369e+02 1.457e+02 2.012e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-25 08:03:20,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=22.5 2024-09-25 08:03:39,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=699248.6666666666, ans=0.125 2024-09-25 08:03:39,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=699248.6666666666, ans=0.125 2024-09-25 08:03:46,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=699248.6666666666, ans=0.125 2024-09-25 08:03:53,687 INFO [train.py:1198] (3/4) Epoch 39, batch 1800, loss[loss=0.198, ctc_loss=0.1274, cr_loss=0.3527, over 17255.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1231, cr_loss=0.3401, over 3369927.10 frames. ], batch size: 44, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 08:03:57,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=699295.3333333334, ans=0.2 2024-09-25 08:04:16,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=699342.0, ans=0.125 2024-09-25 08:04:21,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=22.5 2024-09-25 08:04:23,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-25 08:04:30,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=699388.6666666666, ans=0.125 2024-09-25 08:05:13,795 INFO [train.py:1198] (3/4) Epoch 39, batch 1850, loss[loss=0.1426, ctc_loss=0.08753, cr_loss=0.2752, over 17285.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1226, cr_loss=0.3395, over 3370506.94 frames. ], batch size: 42, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 08:05:25,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=699528.6666666666, ans=0.025 2024-09-25 08:05:47,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=699622.0, ans=0.125 2024-09-25 08:05:48,763 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.264e+02 1.367e+02 1.511e+02 2.303e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-25 08:06:03,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=699668.6666666666, ans=0.125 2024-09-25 08:06:36,626 INFO [train.py:1198] (3/4) Epoch 39, batch 1900, loss[loss=0.1826, ctc_loss=0.1174, cr_loss=0.326, over 17210.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1227, cr_loss=0.3392, over 3370273.18 frames. ], batch size: 50, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 08:06:46,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=699762.0, ans=0.0 2024-09-25 08:06:49,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=699762.0, ans=0.0 2024-09-25 08:06:52,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=699808.6666666666, ans=0.1 2024-09-25 08:06:54,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2024-09-25 08:07:21,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.79 vs. limit=10.0 2024-09-25 08:07:31,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=699902.0, ans=0.2 2024-09-25 08:07:31,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=699902.0, ans=0.0 2024-09-25 08:07:37,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=699902.0, ans=0.125 2024-09-25 08:07:39,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=699948.6666666666, ans=0.2 2024-09-25 08:07:56,981 INFO [train.py:1198] (3/4) Epoch 39, batch 1950, loss[loss=0.2041, ctc_loss=0.131, cr_loss=0.3655, over 17010.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1229, cr_loss=0.3394, over 3378052.26 frames. ], batch size: 53, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 08:08:12,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=700042.0, ans=0.2 2024-09-25 08:08:18,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=700042.0, ans=0.125 2024-09-25 08:08:23,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=700042.0, ans=0.0 2024-09-25 08:08:37,870 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.289e+02 1.367e+02 1.538e+02 1.984e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-25 08:09:08,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=700182.0, ans=0.1 2024-09-25 08:09:23,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=700228.6666666666, ans=0.2 2024-09-25 08:09:25,322 INFO [train.py:1198] (3/4) Epoch 39, batch 2000, loss[loss=0.2131, ctc_loss=0.1412, cr_loss=0.3595, over 17311.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1238, cr_loss=0.3417, over 3376447.90 frames. ], batch size: 51, lr: 3.04e-03, grad_scale: 32.0 2024-09-25 08:09:35,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=700228.6666666666, ans=0.0 2024-09-25 08:09:47,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=700275.3333333334, ans=0.05 2024-09-25 08:09:59,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=700322.0, ans=0.0 2024-09-25 08:10:00,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=700322.0, ans=0.125 2024-09-25 08:10:02,634 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:10:17,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=15.0 2024-09-25 08:10:45,496 INFO [train.py:1198] (3/4) Epoch 39, batch 2050, loss[loss=0.191, ctc_loss=0.1251, cr_loss=0.3294, over 17095.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1243, cr_loss=0.3421, over 3362706.71 frames. ], batch size: 49, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:11:03,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=700508.6666666666, ans=0.125 2024-09-25 08:11:22,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=700555.3333333334, ans=0.125 2024-09-25 08:11:25,133 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.300e+02 1.377e+02 1.457e+02 2.836e+02, threshold=2.753e+02, percent-clipped=1.0 2024-09-25 08:11:33,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=700555.3333333334, ans=0.125 2024-09-25 08:11:49,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=700602.0, ans=0.125 2024-09-25 08:11:52,637 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:12:03,722 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:12:08,230 INFO [train.py:1198] (3/4) Epoch 39, batch 2100, loss[loss=0.1927, ctc_loss=0.1241, cr_loss=0.3431, over 17308.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1244, cr_loss=0.343, over 3362783.42 frames. ], batch size: 51, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:12:08,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=700695.3333333334, ans=10.0 2024-09-25 08:12:34,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=700742.0, ans=0.125 2024-09-25 08:12:38,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=700788.6666666666, ans=0.0 2024-09-25 08:12:51,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=700788.6666666666, ans=0.2 2024-09-25 08:13:03,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.17 vs. limit=6.0 2024-09-25 08:13:30,507 INFO [train.py:1198] (3/4) Epoch 39, batch 2150, loss[loss=0.1897, ctc_loss=0.1208, cr_loss=0.3443, over 17034.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.1248, cr_loss=0.3436, over 3359627.57 frames. ], batch size: 44, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:14:00,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=700975.3333333334, ans=0.04949747468305833 2024-09-25 08:14:00,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=700975.3333333334, ans=0.0 2024-09-25 08:14:10,020 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.289e+02 1.363e+02 1.447e+02 2.047e+02, threshold=2.727e+02, percent-clipped=0.0 2024-09-25 08:14:38,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=8.22 vs. limit=22.5 2024-09-25 08:14:40,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=701115.3333333334, ans=0.0 2024-09-25 08:14:47,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.02 vs. limit=15.0 2024-09-25 08:14:48,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=701115.3333333334, ans=0.025 2024-09-25 08:14:53,310 INFO [train.py:1198] (3/4) Epoch 39, batch 2200, loss[loss=0.1514, ctc_loss=0.09453, cr_loss=0.2843, over 17021.00 frames. ], tot_loss[loss=0.1936, ctc_loss=0.1247, cr_loss=0.3442, over 3360496.35 frames. ], batch size: 39, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:15:13,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2024-09-25 08:15:54,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=701302.0, ans=0.125 2024-09-25 08:15:56,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=701348.6666666666, ans=0.125 2024-09-25 08:16:01,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2024-09-25 08:16:04,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=12.0 2024-09-25 08:16:16,048 INFO [train.py:1198] (3/4) Epoch 39, batch 2250, loss[loss=0.2098, ctc_loss=0.1402, cr_loss=0.3482, over 17214.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1246, cr_loss=0.3435, over 3358354.84 frames. ], batch size: 50, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:16:16,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=701395.3333333334, ans=0.125 2024-09-25 08:16:16,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=701395.3333333334, ans=0.0 2024-09-25 08:16:19,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=701395.3333333334, ans=0.125 2024-09-25 08:16:48,901 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=22.5 2024-09-25 08:16:50,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.33 vs. limit=12.0 2024-09-25 08:16:52,927 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.277e+02 1.348e+02 1.481e+02 2.538e+02, threshold=2.695e+02, percent-clipped=0.0 2024-09-25 08:16:53,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=701488.6666666666, ans=0.0 2024-09-25 08:17:35,951 INFO [train.py:1198] (3/4) Epoch 39, batch 2300, loss[loss=0.1907, ctc_loss=0.1219, cr_loss=0.3439, over 17058.00 frames. ], tot_loss[loss=0.1925, ctc_loss=0.124, cr_loss=0.3425, over 3366241.35 frames. ], batch size: 46, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:17:39,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=701628.6666666666, ans=0.2 2024-09-25 08:18:33,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=701768.6666666666, ans=0.09899494936611666 2024-09-25 08:18:48,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=701815.3333333334, ans=0.125 2024-09-25 08:18:50,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2024-09-25 08:18:56,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=701815.3333333334, ans=0.0 2024-09-25 08:19:01,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=701815.3333333334, ans=0.0 2024-09-25 08:19:03,996 INFO [train.py:1198] (3/4) Epoch 39, batch 2350, loss[loss=0.2302, ctc_loss=0.1514, cr_loss=0.3938, over 17006.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1238, cr_loss=0.342, over 3365205.62 frames. ], batch size: 52, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:19:04,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=701862.0, ans=0.0 2024-09-25 08:19:06,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=701862.0, ans=0.125 2024-09-25 08:19:10,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=701862.0, ans=0.125 2024-09-25 08:19:17,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=701862.0, ans=0.125 2024-09-25 08:19:31,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=701908.6666666666, ans=0.125 2024-09-25 08:19:39,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=701955.3333333334, ans=0.0 2024-09-25 08:19:40,704 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.317e+02 1.394e+02 1.473e+02 1.777e+02, threshold=2.787e+02, percent-clipped=0.0 2024-09-25 08:20:23,820 INFO [train.py:1198] (3/4) Epoch 39, batch 2400, loss[loss=0.161, ctc_loss=0.1022, cr_loss=0.294, over 17181.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1237, cr_loss=0.3422, over 3367695.08 frames. ], batch size: 41, lr: 3.04e-03, grad_scale: 32.0 2024-09-25 08:20:24,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=702095.3333333334, ans=0.025 2024-09-25 08:20:27,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=702095.3333333334, ans=0.2 2024-09-25 08:20:35,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=702095.3333333334, ans=0.125 2024-09-25 08:20:36,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=702095.3333333334, ans=0.0 2024-09-25 08:21:22,791 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=12.0 2024-09-25 08:21:26,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.54 vs. limit=15.0 2024-09-25 08:21:28,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=702282.0, ans=0.0 2024-09-25 08:21:45,940 INFO [train.py:1198] (3/4) Epoch 39, batch 2450, loss[loss=0.1901, ctc_loss=0.1219, cr_loss=0.3411, over 17359.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1242, cr_loss=0.3432, over 3365447.35 frames. ], batch size: 48, lr: 3.04e-03, grad_scale: 32.0 2024-09-25 08:21:52,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=702328.6666666666, ans=0.125 2024-09-25 08:21:54,721 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2024-09-25 08:22:10,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=702375.3333333334, ans=0.125 2024-09-25 08:22:16,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=702422.0, ans=0.1 2024-09-25 08:22:22,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=702422.0, ans=0.0 2024-09-25 08:22:24,052 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.006e+02 1.314e+02 1.404e+02 1.496e+02 1.831e+02, threshold=2.808e+02, percent-clipped=0.0 2024-09-25 08:22:37,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=702468.6666666666, ans=0.0 2024-09-25 08:22:49,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=702515.3333333334, ans=0.125 2024-09-25 08:22:49,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=702515.3333333334, ans=0.125 2024-09-25 08:23:08,342 INFO [train.py:1198] (3/4) Epoch 39, batch 2500, loss[loss=0.1555, ctc_loss=0.09655, cr_loss=0.2949, over 17250.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.1239, cr_loss=0.3427, over 3366656.58 frames. ], batch size: 42, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:23:20,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=702562.0, ans=10.0 2024-09-25 08:23:41,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2024-09-25 08:23:48,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=702655.3333333334, ans=0.0 2024-09-25 08:24:09,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=702702.0, ans=0.125 2024-09-25 08:24:16,399 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:24:20,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=702748.6666666666, ans=0.125 2024-09-25 08:24:32,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=702795.3333333334, ans=0.125 2024-09-25 08:24:33,562 INFO [train.py:1198] (3/4) Epoch 39, batch 2550, loss[loss=0.2225, ctc_loss=0.1462, cr_loss=0.3818, over 14852.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1238, cr_loss=0.3423, over 3364345.71 frames. ], batch size: 89, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:25:09,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=702888.6666666666, ans=0.025 2024-09-25 08:25:10,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=702888.6666666666, ans=0.125 2024-09-25 08:25:10,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=702888.6666666666, ans=0.125 2024-09-25 08:25:12,027 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.300e+02 1.397e+02 1.510e+02 1.872e+02, threshold=2.794e+02, percent-clipped=0.0 2024-09-25 08:25:14,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=702888.6666666666, ans=0.1 2024-09-25 08:25:15,769 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:25:56,181 INFO [train.py:1198] (3/4) Epoch 39, batch 2600, loss[loss=0.2183, ctc_loss=0.1395, cr_loss=0.3944, over 17215.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1244, cr_loss=0.3437, over 3362039.48 frames. ], batch size: 55, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:26:38,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=15.0 2024-09-25 08:26:41,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.43 vs. limit=15.0 2024-09-25 08:27:02,626 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.95 vs. limit=12.0 2024-09-25 08:27:03,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=703215.3333333334, ans=0.125 2024-09-25 08:27:15,881 INFO [train.py:1198] (3/4) Epoch 39, batch 2650, loss[loss=0.202, ctc_loss=0.1294, cr_loss=0.3633, over 17026.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1242, cr_loss=0.3432, over 3368830.42 frames. ], batch size: 52, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:27:40,584 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.92 vs. limit=15.0 2024-09-25 08:27:53,519 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.290e+02 1.367e+02 1.451e+02 1.893e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-25 08:27:54,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2024-09-25 08:28:11,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=703402.0, ans=0.125 2024-09-25 08:28:42,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=703495.3333333334, ans=15.0 2024-09-25 08:28:43,414 INFO [train.py:1198] (3/4) Epoch 39, batch 2700, loss[loss=0.1584, ctc_loss=0.09922, cr_loss=0.296, over 17183.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1241, cr_loss=0.3426, over 3360965.47 frames. ], batch size: 41, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:28:54,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=703495.3333333334, ans=0.125 2024-09-25 08:29:10,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=703542.0, ans=0.035 2024-09-25 08:29:29,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=703635.3333333334, ans=0.125 2024-09-25 08:29:31,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=703635.3333333334, ans=0.2 2024-09-25 08:30:02,893 INFO [train.py:1198] (3/4) Epoch 39, batch 2750, loss[loss=0.1521, ctc_loss=0.094, cr_loss=0.2906, over 16735.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1239, cr_loss=0.3414, over 3361839.88 frames. ], batch size: 37, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:30:17,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=703775.3333333334, ans=0.02 2024-09-25 08:30:41,402 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.292e+02 1.404e+02 1.488e+02 2.567e+02, threshold=2.807e+02, percent-clipped=0.0 2024-09-25 08:30:55,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=703868.6666666666, ans=0.1 2024-09-25 08:31:11,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=703915.3333333334, ans=0.2 2024-09-25 08:31:25,415 INFO [train.py:1198] (3/4) Epoch 39, batch 2800, loss[loss=0.2064, ctc_loss=0.1336, cr_loss=0.3636, over 16092.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.124, cr_loss=0.3418, over 3358211.25 frames. ], batch size: 74, lr: 3.04e-03, grad_scale: 32.0 2024-09-25 08:31:30,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=703962.0, ans=0.125 2024-09-25 08:31:51,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0 2024-09-25 08:32:28,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=704148.6666666666, ans=0.0 2024-09-25 08:32:45,470 INFO [train.py:1198] (3/4) Epoch 39, batch 2850, loss[loss=0.2044, ctc_loss=0.1343, cr_loss=0.3504, over 17022.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1242, cr_loss=0.3424, over 3362708.98 frames. ], batch size: 51, lr: 3.04e-03, grad_scale: 32.0 2024-09-25 08:32:50,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=704195.3333333334, ans=0.125 2024-09-25 08:33:00,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.39 vs. limit=15.0 2024-09-25 08:33:01,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2024-09-25 08:33:10,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=704242.0, ans=0.2 2024-09-25 08:33:31,755 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.313e+02 1.374e+02 1.493e+02 1.951e+02, threshold=2.749e+02, percent-clipped=0.0 2024-09-25 08:33:35,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.32 vs. limit=15.0 2024-09-25 08:33:57,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.98 vs. limit=15.0 2024-09-25 08:33:58,853 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=15.0 2024-09-25 08:34:02,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=704382.0, ans=0.125 2024-09-25 08:34:13,712 INFO [train.py:1198] (3/4) Epoch 39, batch 2900, loss[loss=0.1637, ctc_loss=0.1008, cr_loss=0.3146, over 17189.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1239, cr_loss=0.3415, over 3358501.06 frames. ], batch size: 41, lr: 3.04e-03, grad_scale: 32.0 2024-09-25 08:34:26,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704428.6666666666, ans=0.1 2024-09-25 08:34:34,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=704475.3333333334, ans=0.0 2024-09-25 08:34:42,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=704475.3333333334, ans=0.0 2024-09-25 08:34:46,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=704522.0, ans=0.125 2024-09-25 08:35:33,811 INFO [train.py:1198] (3/4) Epoch 39, batch 2950, loss[loss=0.2052, ctc_loss=0.1322, cr_loss=0.3652, over 17217.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1234, cr_loss=0.3406, over 3360268.62 frames. ], batch size: 47, lr: 3.04e-03, grad_scale: 32.0 2024-09-25 08:36:15,121 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.295e+02 1.375e+02 1.468e+02 1.743e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-25 08:36:26,513 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:36:32,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.10 vs. limit=8.0 2024-09-25 08:36:39,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=704848.6666666666, ans=0.09899494936611666 2024-09-25 08:36:55,831 INFO [train.py:1198] (3/4) Epoch 39, batch 3000, loss[loss=0.2015, ctc_loss=0.129, cr_loss=0.3625, over 17294.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1234, cr_loss=0.3405, over 3353844.64 frames. ], batch size: 51, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:36:55,831 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 08:37:11,148 INFO [train.py:1230] (3/4) Epoch 39, validation: loss=0.03549, ctc_loss=0.03549, cr_loss=9.367e-15, over 944034.00 frames. 2024-09-25 08:37:11,149 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 08:37:11,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=704895.3333333334, ans=0.2 2024-09-25 08:37:16,464 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=22.5 2024-09-25 08:37:19,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=704895.3333333334, ans=0.125 2024-09-25 08:37:33,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=704942.0, ans=0.125 2024-09-25 08:37:34,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=704942.0, ans=0.0 2024-09-25 08:37:36,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704942.0, ans=0.1 2024-09-25 08:38:16,733 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:38:29,015 INFO [train.py:1198] (3/4) Epoch 39, batch 3050, loss[loss=0.2246, ctc_loss=0.1476, cr_loss=0.3848, over 14966.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1237, cr_loss=0.3412, over 3354618.08 frames. ], batch size: 89, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:38:31,032 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:38:39,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2024-09-25 08:38:43,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=705175.3333333334, ans=0.05 2024-09-25 08:38:44,915 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:38:49,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2024-09-25 08:39:08,845 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.272e+02 1.353e+02 1.453e+02 2.990e+02, threshold=2.707e+02, percent-clipped=1.0 2024-09-25 08:39:39,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=705315.3333333334, ans=0.1 2024-09-25 08:39:54,026 INFO [train.py:1198] (3/4) Epoch 39, batch 3100, loss[loss=0.1879, ctc_loss=0.1224, cr_loss=0.3273, over 17210.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1235, cr_loss=0.3407, over 3358480.81 frames. ], batch size: 50, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:40:08,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=705408.6666666666, ans=0.025 2024-09-25 08:41:11,988 INFO [train.py:1198] (3/4) Epoch 39, batch 3150, loss[loss=0.2233, ctc_loss=0.1463, cr_loss=0.3852, over 15123.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1233, cr_loss=0.3401, over 3352945.44 frames. ], batch size: 89, lr: 3.03e-03, grad_scale: 16.0 2024-09-25 08:41:27,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=705642.0, ans=15.0 2024-09-25 08:41:31,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=705642.0, ans=0.125 2024-09-25 08:41:43,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=705688.6666666666, ans=0.0 2024-09-25 08:41:45,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705688.6666666666, ans=0.1 2024-09-25 08:41:50,876 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.273e+02 1.360e+02 1.459e+02 1.912e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-25 08:42:29,863 INFO [train.py:1198] (3/4) Epoch 39, batch 3200, loss[loss=0.1719, ctc_loss=0.11, cr_loss=0.3095, over 16950.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1232, cr_loss=0.3395, over 3365692.01 frames. ], batch size: 42, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:42:30,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=705828.6666666666, ans=0.125 2024-09-25 08:42:36,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=705828.6666666666, ans=0.0 2024-09-25 08:42:38,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=705828.6666666666, ans=0.0 2024-09-25 08:42:48,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=705875.3333333334, ans=0.1 2024-09-25 08:42:49,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705875.3333333334, ans=0.1 2024-09-25 08:43:07,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-09-25 08:43:13,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=22.5 2024-09-25 08:43:25,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=705968.6666666666, ans=0.125 2024-09-25 08:43:31,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=706015.3333333334, ans=0.125 2024-09-25 08:43:37,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=706015.3333333334, ans=0.0 2024-09-25 08:43:48,014 INFO [train.py:1198] (3/4) Epoch 39, batch 3250, loss[loss=0.1844, ctc_loss=0.1162, cr_loss=0.3407, over 17165.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1234, cr_loss=0.3403, over 3358429.42 frames. ], batch size: 45, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:44:04,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2024-09-25 08:44:26,959 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.281e+02 1.355e+02 1.471e+02 2.202e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-25 08:44:55,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=706248.6666666666, ans=0.125 2024-09-25 08:45:05,829 INFO [train.py:1198] (3/4) Epoch 39, batch 3300, loss[loss=0.2182, ctc_loss=0.14, cr_loss=0.3911, over 17001.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1236, cr_loss=0.3408, over 3358311.66 frames. ], batch size: 56, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:45:39,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=706388.6666666666, ans=0.125 2024-09-25 08:46:01,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=706435.3333333334, ans=0.125 2024-09-25 08:46:06,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=706435.3333333334, ans=0.125 2024-09-25 08:46:18,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=706482.0, ans=0.125 2024-09-25 08:46:26,220 INFO [train.py:1198] (3/4) Epoch 39, batch 3350, loss[loss=0.1611, ctc_loss=0.1003, cr_loss=0.3041, over 17033.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1235, cr_loss=0.3412, over 3364849.75 frames. ], batch size: 39, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:47:05,608 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.259e+02 1.360e+02 1.476e+02 1.978e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-25 08:47:19,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=706668.6666666666, ans=0.0 2024-09-25 08:47:43,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=706762.0, ans=0.125 2024-09-25 08:47:44,982 INFO [train.py:1198] (3/4) Epoch 39, batch 3400, loss[loss=0.1677, ctc_loss=0.1049, cr_loss=0.3143, over 16951.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.124, cr_loss=0.3422, over 3364076.33 frames. ], batch size: 42, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:47:48,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=706762.0, ans=0.2 2024-09-25 08:47:51,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=706762.0, ans=0.1 2024-09-25 08:47:56,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.82 vs. limit=10.0 2024-09-25 08:48:00,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=706808.6666666666, ans=0.125 2024-09-25 08:48:05,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=706808.6666666666, ans=0.125 2024-09-25 08:48:38,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=706902.0, ans=0.125 2024-09-25 08:48:44,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=706902.0, ans=0.05 2024-09-25 08:48:47,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=706948.6666666666, ans=0.125 2024-09-25 08:48:49,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=706948.6666666666, ans=0.125 2024-09-25 08:49:03,011 INFO [train.py:1198] (3/4) Epoch 39, batch 3450, loss[loss=0.1659, ctc_loss=0.1053, cr_loss=0.3032, over 17084.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1241, cr_loss=0.3426, over 3368153.74 frames. ], batch size: 43, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:49:17,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2024-09-25 08:49:26,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=707042.0, ans=0.125 2024-09-25 08:49:48,211 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.321e+02 1.412e+02 1.498e+02 2.585e+02, threshold=2.824e+02, percent-clipped=0.0 2024-09-25 08:49:59,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=707135.3333333334, ans=0.0 2024-09-25 08:50:07,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2024-09-25 08:50:09,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=707182.0, ans=0.125 2024-09-25 08:50:19,733 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-09-25 08:50:22,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=707182.0, ans=0.0 2024-09-25 08:50:23,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=707182.0, ans=0.2 2024-09-25 08:50:26,705 INFO [train.py:1198] (3/4) Epoch 39, batch 3500, loss[loss=0.1734, ctc_loss=0.1123, cr_loss=0.3057, over 16852.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.1249, cr_loss=0.3428, over 3349442.03 frames. ], batch size: 58, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:50:32,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.06 vs. limit=15.0 2024-09-25 08:50:57,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=707322.0, ans=12.0 2024-09-25 08:51:23,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=707368.6666666666, ans=0.1 2024-09-25 08:51:27,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=707415.3333333334, ans=0.0 2024-09-25 08:51:42,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2024-09-25 08:51:44,612 INFO [train.py:1198] (3/4) Epoch 39, batch 3550, loss[loss=0.22, ctc_loss=0.1502, cr_loss=0.3493, over 11196.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1247, cr_loss=0.3424, over 3335174.90 frames. ], batch size: 123, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:51:45,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=707462.0, ans=0.2 2024-09-25 08:51:46,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=707462.0, ans=0.125 2024-09-25 08:51:56,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=707462.0, ans=0.025 2024-09-25 08:52:23,764 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.264e+02 1.345e+02 1.429e+02 1.997e+02, threshold=2.690e+02, percent-clipped=0.0 2024-09-25 08:52:25,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=707555.3333333334, ans=0.125 2024-09-25 08:52:31,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=707602.0, ans=0.125 2024-09-25 08:52:35,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=15.0 2024-09-25 08:52:47,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=707648.6666666666, ans=0.0 2024-09-25 08:52:54,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=707648.6666666666, ans=0.125 2024-09-25 08:53:02,937 INFO [train.py:1198] (3/4) Epoch 39, batch 3600, loss[loss=0.2002, ctc_loss=0.1284, cr_loss=0.3592, over 17259.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.1241, cr_loss=0.3414, over 3348680.97 frames. ], batch size: 44, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:53:07,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=707695.3333333334, ans=0.2 2024-09-25 08:53:40,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=707788.6666666666, ans=0.05 2024-09-25 08:53:40,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=22.5 2024-09-25 08:53:55,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=707835.3333333334, ans=0.0 2024-09-25 08:54:11,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=707882.0, ans=0.1 2024-09-25 08:54:20,463 INFO [train.py:1198] (3/4) Epoch 39, batch 3650, loss[loss=0.1798, ctc_loss=0.115, cr_loss=0.3238, over 17319.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1246, cr_loss=0.3425, over 3347611.87 frames. ], batch size: 51, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:54:44,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=707975.3333333334, ans=0.05 2024-09-25 08:54:57,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.51 vs. limit=6.0 2024-09-25 08:54:59,909 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.284e+02 1.368e+02 1.461e+02 2.127e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 08:55:40,514 INFO [train.py:1198] (3/4) Epoch 39, batch 3700, loss[loss=0.2031, ctc_loss=0.1322, cr_loss=0.3547, over 17003.00 frames. ], tot_loss[loss=0.1925, ctc_loss=0.1242, cr_loss=0.3417, over 3348582.27 frames. ], batch size: 56, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:55:44,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=708162.0, ans=0.125 2024-09-25 08:56:06,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=708208.6666666666, ans=0.125 2024-09-25 08:56:06,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=708208.6666666666, ans=0.0 2024-09-25 08:56:06,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.47 vs. limit=15.0 2024-09-25 08:56:09,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=15.0 2024-09-25 08:56:28,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=708302.0, ans=0.0 2024-09-25 08:56:59,404 INFO [train.py:1198] (3/4) Epoch 39, batch 3750, loss[loss=0.2347, ctc_loss=0.155, cr_loss=0.3988, over 16998.00 frames. ], tot_loss[loss=0.1925, ctc_loss=0.1242, cr_loss=0.3414, over 3340156.14 frames. ], batch size: 53, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:57:06,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=708395.3333333334, ans=0.0 2024-09-25 08:57:17,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=708442.0, ans=0.1 2024-09-25 08:57:38,680 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.270e+02 1.361e+02 1.496e+02 2.109e+02, threshold=2.722e+02, percent-clipped=0.0 2024-09-25 08:57:48,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=708535.3333333334, ans=0.0 2024-09-25 08:57:53,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2024-09-25 08:58:15,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=22.5 2024-09-25 08:58:18,292 INFO [train.py:1198] (3/4) Epoch 39, batch 3800, loss[loss=0.2123, ctc_loss=0.1409, cr_loss=0.3573, over 16904.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1238, cr_loss=0.3407, over 3338335.42 frames. ], batch size: 58, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:58:46,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=708675.3333333334, ans=0.2 2024-09-25 08:59:02,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=708722.0, ans=0.125 2024-09-25 08:59:04,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=708722.0, ans=0.125 2024-09-25 08:59:11,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=708768.6666666666, ans=0.1 2024-09-25 08:59:24,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.50 vs. limit=5.0 2024-09-25 08:59:26,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=708815.3333333334, ans=0.125 2024-09-25 08:59:39,441 INFO [train.py:1198] (3/4) Epoch 39, batch 3850, loss[loss=0.2186, ctc_loss=0.1425, cr_loss=0.3802, over 11555.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1238, cr_loss=0.3396, over 3295965.94 frames. ], batch size: 123, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:59:58,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=708908.6666666666, ans=0.125 2024-09-25 08:59:58,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=708908.6666666666, ans=0.0 2024-09-25 09:00:05,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.25 vs. limit=10.0 2024-09-25 09:00:10,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.20 vs. limit=6.0 2024-09-25 09:00:18,550 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.306e+02 1.417e+02 1.619e+02 3.168e+02, threshold=2.835e+02, percent-clipped=2.0 2024-09-25 09:00:41,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.70 vs. limit=15.0 2024-09-25 09:01:40,861 INFO [train.py:1198] (3/4) Epoch 40, batch 0, loss[loss=0.2184, ctc_loss=0.1449, cr_loss=0.3675, over 11868.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1449, cr_loss=0.3675, over 11868.00 frames. ], batch size: 124, lr: 2.99e-03, grad_scale: 32.0 2024-09-25 09:01:40,861 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 09:01:53,218 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.7825, 2.4452, 2.3547, 2.4735, 2.0320, 2.4710, 2.4041, 2.2478], device='cuda:3') 2024-09-25 09:01:56,595 INFO [train.py:1230] (3/4) Epoch 40, validation: loss=0.03491, ctc_loss=0.03491, cr_loss=1.007e-14, over 944034.00 frames. 2024-09-25 09:01:56,595 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 09:02:02,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.75 vs. limit=12.0 2024-09-25 09:02:20,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=709123.3333333334, ans=0.125 2024-09-25 09:02:25,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=709123.3333333334, ans=0.125 2024-09-25 09:02:36,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709170.0, ans=0.1 2024-09-25 09:02:46,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=709216.6666666666, ans=0.1 2024-09-25 09:03:06,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=709263.3333333334, ans=0.1 2024-09-25 09:03:15,889 INFO [train.py:1198] (3/4) Epoch 40, batch 50, loss[loss=0.2017, ctc_loss=0.1301, cr_loss=0.3581, over 17236.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1237, cr_loss=0.3409, over 753551.27 frames. ], batch size: 50, lr: 2.99e-03, grad_scale: 32.0 2024-09-25 09:03:27,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.74 vs. limit=22.5 2024-09-25 09:03:38,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=709356.6666666666, ans=0.0 2024-09-25 09:03:39,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=709356.6666666666, ans=0.025 2024-09-25 09:04:10,886 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.285e+02 1.412e+02 1.570e+02 2.190e+02, threshold=2.824e+02, percent-clipped=0.0 2024-09-25 09:04:31,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=709496.6666666666, ans=0.125 2024-09-25 09:04:33,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=709496.6666666666, ans=0.125 2024-09-25 09:04:44,463 INFO [train.py:1198] (3/4) Epoch 40, batch 100, loss[loss=0.1917, ctc_loss=0.1245, cr_loss=0.3358, over 17226.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1245, cr_loss=0.342, over 1321336.43 frames. ], batch size: 50, lr: 2.99e-03, grad_scale: 32.0 2024-09-25 09:04:46,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=709543.3333333334, ans=0.0 2024-09-25 09:04:56,536 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.45 vs. limit=8.0 2024-09-25 09:05:07,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=709590.0, ans=0.0 2024-09-25 09:05:11,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=709590.0, ans=0.1 2024-09-25 09:05:19,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=709636.6666666666, ans=0.0 2024-09-25 09:05:22,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=709636.6666666666, ans=0.125 2024-09-25 09:05:35,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2024-09-25 09:06:02,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=709730.0, ans=0.125 2024-09-25 09:06:02,845 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.66 vs. limit=10.0 2024-09-25 09:06:05,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=709776.6666666666, ans=0.125 2024-09-25 09:06:07,000 INFO [train.py:1198] (3/4) Epoch 40, batch 150, loss[loss=0.1996, ctc_loss=0.129, cr_loss=0.3531, over 16986.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1244, cr_loss=0.3425, over 1773769.09 frames. ], batch size: 56, lr: 2.99e-03, grad_scale: 32.0 2024-09-25 09:06:07,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=709776.6666666666, ans=0.125 2024-09-25 09:06:33,374 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=12.0 2024-09-25 09:06:34,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=709823.3333333334, ans=0.0 2024-09-25 09:06:56,159 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.270e+02 1.336e+02 1.420e+02 2.555e+02, threshold=2.672e+02, percent-clipped=0.0 2024-09-25 09:06:58,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=709916.6666666666, ans=0.025 2024-09-25 09:07:02,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=709916.6666666666, ans=0.125 2024-09-25 09:07:07,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=709916.6666666666, ans=0.125 2024-09-25 09:07:29,563 INFO [train.py:1198] (3/4) Epoch 40, batch 200, loss[loss=0.1888, ctc_loss=0.1213, cr_loss=0.3377, over 17267.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.1248, cr_loss=0.3428, over 2121851.40 frames. ], batch size: 44, lr: 2.99e-03, grad_scale: 32.0 2024-09-25 09:07:33,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=710010.0, ans=0.125 2024-09-25 09:07:33,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=710010.0, ans=0.125 2024-09-25 09:07:44,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=710056.6666666666, ans=0.125 2024-09-25 09:08:18,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-09-25 09:08:24,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=710150.0, ans=0.125 2024-09-25 09:08:50,011 INFO [train.py:1198] (3/4) Epoch 40, batch 250, loss[loss=0.1812, ctc_loss=0.1168, cr_loss=0.3221, over 16936.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1238, cr_loss=0.3411, over 2392428.42 frames. ], batch size: 42, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:09:09,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=710290.0, ans=0.0 2024-09-25 09:09:14,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=15.0 2024-09-25 09:09:39,757 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.275e+02 1.359e+02 1.486e+02 2.398e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-25 09:09:59,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2024-09-25 09:10:12,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=710430.0, ans=0.0 2024-09-25 09:10:16,461 INFO [train.py:1198] (3/4) Epoch 40, batch 300, loss[loss=0.1764, ctc_loss=0.1081, cr_loss=0.3419, over 17054.00 frames. ], tot_loss[loss=0.1938, ctc_loss=0.1251, cr_loss=0.3436, over 2599728.27 frames. ], batch size: 39, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:10:26,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=710476.6666666666, ans=0.125 2024-09-25 09:10:34,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=710523.3333333334, ans=0.0 2024-09-25 09:10:45,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=710523.3333333334, ans=0.1 2024-09-25 09:10:46,975 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:10:50,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=710570.0, ans=0.0 2024-09-25 09:10:55,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.42 vs. limit=15.0 2024-09-25 09:10:59,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=710570.0, ans=0.0 2024-09-25 09:11:01,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710570.0, ans=0.1 2024-09-25 09:11:02,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.70 vs. limit=10.0 2024-09-25 09:11:19,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=710663.3333333334, ans=0.2 2024-09-25 09:11:33,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=710663.3333333334, ans=0.125 2024-09-25 09:11:36,232 INFO [train.py:1198] (3/4) Epoch 40, batch 350, loss[loss=0.1532, ctc_loss=0.0958, cr_loss=0.2872, over 16951.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1235, cr_loss=0.3404, over 2769965.96 frames. ], batch size: 42, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:11:41,454 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:11:57,294 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.45 vs. limit=10.0 2024-09-25 09:12:27,035 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.301e+02 1.389e+02 1.518e+02 2.541e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-25 09:12:56,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=710896.6666666666, ans=0.2 2024-09-25 09:12:59,095 INFO [train.py:1198] (3/4) Epoch 40, batch 400, loss[loss=0.2187, ctc_loss=0.1425, cr_loss=0.3812, over 17015.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1234, cr_loss=0.3405, over 2910708.37 frames. ], batch size: 52, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:13:12,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=710943.3333333334, ans=0.125 2024-09-25 09:13:13,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=710990.0, ans=0.125 2024-09-25 09:13:15,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=710990.0, ans=0.125 2024-09-25 09:13:31,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=711036.6666666666, ans=0.125 2024-09-25 09:13:42,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=711036.6666666666, ans=0.2 2024-09-25 09:13:46,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=711083.3333333334, ans=0.0 2024-09-25 09:13:58,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=711083.3333333334, ans=0.025 2024-09-25 09:14:11,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=711130.0, ans=0.125 2024-09-25 09:14:22,076 INFO [train.py:1198] (3/4) Epoch 40, batch 450, loss[loss=0.1889, ctc_loss=0.1214, cr_loss=0.3379, over 16994.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.1241, cr_loss=0.3415, over 3003833.43 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:14:24,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2024-09-25 09:14:30,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=711176.6666666666, ans=0.0 2024-09-25 09:15:01,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=711270.0, ans=0.2 2024-09-25 09:15:12,493 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.283e+02 1.337e+02 1.424e+02 2.250e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-25 09:15:12,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=711316.6666666666, ans=0.125 2024-09-25 09:15:13,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2024-09-25 09:15:19,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=711316.6666666666, ans=0.2 2024-09-25 09:15:26,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.83 vs. limit=10.0 2024-09-25 09:15:28,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=711363.3333333334, ans=0.0 2024-09-25 09:15:44,621 INFO [train.py:1198] (3/4) Epoch 40, batch 500, loss[loss=0.1792, ctc_loss=0.1114, cr_loss=0.3389, over 17065.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1237, cr_loss=0.3414, over 3085538.54 frames. ], batch size: 46, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:15:45,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=711410.0, ans=0.0 2024-09-25 09:15:46,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=711410.0, ans=0.025 2024-09-25 09:15:49,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=711410.0, ans=0.1 2024-09-25 09:16:20,001 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:16:46,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=711550.0, ans=0.025 2024-09-25 09:16:51,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=711596.6666666666, ans=0.025 2024-09-25 09:17:00,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=22.5 2024-09-25 09:17:01,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=711596.6666666666, ans=0.125 2024-09-25 09:17:07,180 INFO [train.py:1198] (3/4) Epoch 40, batch 550, loss[loss=0.213, ctc_loss=0.1388, cr_loss=0.3709, over 15900.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.123, cr_loss=0.3403, over 3147041.93 frames. ], batch size: 74, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:17:25,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=711690.0, ans=0.2 2024-09-25 09:17:36,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=711690.0, ans=0.1 2024-09-25 09:17:40,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-09-25 09:17:56,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2024-09-25 09:17:57,150 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.279e+02 1.358e+02 1.490e+02 2.059e+02, threshold=2.716e+02, percent-clipped=0.0 2024-09-25 09:18:19,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.26 vs. limit=10.0 2024-09-25 09:18:28,031 INFO [train.py:1198] (3/4) Epoch 40, batch 600, loss[loss=0.2252, ctc_loss=0.1463, cr_loss=0.3945, over 17210.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1226, cr_loss=0.3394, over 3201106.31 frames. ], batch size: 55, lr: 2.98e-03, grad_scale: 16.0 2024-09-25 09:18:37,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=711876.6666666666, ans=0.0 2024-09-25 09:19:20,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=712016.6666666666, ans=0.125 2024-09-25 09:19:30,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=12.0 2024-09-25 09:19:46,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=712063.3333333334, ans=0.125 2024-09-25 09:19:46,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=712063.3333333334, ans=0.0 2024-09-25 09:19:48,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=712063.3333333334, ans=0.0 2024-09-25 09:19:55,903 INFO [train.py:1198] (3/4) Epoch 40, batch 650, loss[loss=0.2014, ctc_loss=0.1309, cr_loss=0.3526, over 17036.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1231, cr_loss=0.3398, over 3233068.84 frames. ], batch size: 53, lr: 2.98e-03, grad_scale: 16.0 2024-09-25 09:20:01,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=712110.0, ans=0.0 2024-09-25 09:20:42,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=712250.0, ans=0.0 2024-09-25 09:20:45,846 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.262e+02 1.364e+02 1.466e+02 1.763e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-25 09:20:54,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=712250.0, ans=0.125 2024-09-25 09:21:04,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=12.0 2024-09-25 09:21:16,167 INFO [train.py:1198] (3/4) Epoch 40, batch 700, loss[loss=0.2337, ctc_loss=0.1568, cr_loss=0.3844, over 11842.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1235, cr_loss=0.3414, over 3254002.10 frames. ], batch size: 123, lr: 2.98e-03, grad_scale: 16.0 2024-09-25 09:21:27,765 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:21:28,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.90 vs. limit=10.0 2024-09-25 09:21:31,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=712390.0, ans=0.2 2024-09-25 09:21:52,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=712436.6666666666, ans=0.09899494936611666 2024-09-25 09:22:09,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=712483.3333333334, ans=0.125 2024-09-25 09:22:17,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=712483.3333333334, ans=0.125 2024-09-25 09:22:38,080 INFO [train.py:1198] (3/4) Epoch 40, batch 750, loss[loss=0.1694, ctc_loss=0.1075, cr_loss=0.3094, over 16643.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1235, cr_loss=0.3411, over 3266194.36 frames. ], batch size: 37, lr: 2.98e-03, grad_scale: 16.0 2024-09-25 09:22:38,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=712576.6666666666, ans=0.0 2024-09-25 09:22:40,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=712576.6666666666, ans=0.1 2024-09-25 09:23:02,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=712623.3333333334, ans=0.1 2024-09-25 09:23:19,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=712670.0, ans=0.2 2024-09-25 09:23:27,402 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.294e+02 1.361e+02 1.444e+02 2.754e+02, threshold=2.722e+02, percent-clipped=1.0 2024-09-25 09:23:27,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=712716.6666666666, ans=0.125 2024-09-25 09:24:03,445 INFO [train.py:1198] (3/4) Epoch 40, batch 800, loss[loss=0.1633, ctc_loss=0.1044, cr_loss=0.2948, over 15910.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1237, cr_loss=0.3412, over 3289591.05 frames. ], batch size: 35, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:24:33,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=712856.6666666666, ans=0.125 2024-09-25 09:24:38,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=712903.3333333334, ans=0.0 2024-09-25 09:25:07,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=712950.0, ans=0.0 2024-09-25 09:25:11,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=712996.6666666666, ans=0.0 2024-09-25 09:25:11,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=712996.6666666666, ans=0.0 2024-09-25 09:25:22,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-09-25 09:25:26,610 INFO [train.py:1198] (3/4) Epoch 40, batch 850, loss[loss=0.1619, ctc_loss=0.1002, cr_loss=0.3081, over 16965.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1239, cr_loss=0.3416, over 3309432.30 frames. ], batch size: 42, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:25:48,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=22.5 2024-09-25 09:25:58,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=713136.6666666666, ans=0.0 2024-09-25 09:26:16,238 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.263e+02 1.352e+02 1.405e+02 2.259e+02, threshold=2.704e+02, percent-clipped=0.0 2024-09-25 09:26:19,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=713183.3333333334, ans=0.05 2024-09-25 09:26:25,964 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:26:36,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=713230.0, ans=0.0 2024-09-25 09:26:43,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=22.5 2024-09-25 09:26:49,142 INFO [train.py:1198] (3/4) Epoch 40, batch 900, loss[loss=0.1871, ctc_loss=0.1197, cr_loss=0.3374, over 17306.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.124, cr_loss=0.3417, over 3316779.37 frames. ], batch size: 51, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:26:55,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=713276.6666666666, ans=0.0 2024-09-25 09:26:57,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=713276.6666666666, ans=0.125 2024-09-25 09:27:11,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=713323.3333333334, ans=0.0 2024-09-25 09:27:17,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2024-09-25 09:27:37,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=713416.6666666666, ans=0.0 2024-09-25 09:27:54,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=713463.3333333334, ans=0.05 2024-09-25 09:27:55,149 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:28:09,206 INFO [train.py:1198] (3/4) Epoch 40, batch 950, loss[loss=0.2041, ctc_loss=0.1336, cr_loss=0.3525, over 17151.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1246, cr_loss=0.3429, over 3330484.54 frames. ], batch size: 45, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:28:12,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=713510.0, ans=10.0 2024-09-25 09:29:01,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=713650.0, ans=0.1 2024-09-25 09:29:04,047 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.318e+02 1.382e+02 1.485e+02 2.096e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-25 09:29:07,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=713650.0, ans=0.2 2024-09-25 09:29:12,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=713650.0, ans=0.025 2024-09-25 09:29:31,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-09-25 09:29:37,544 INFO [train.py:1198] (3/4) Epoch 40, batch 1000, loss[loss=0.2008, ctc_loss=0.1294, cr_loss=0.357, over 16981.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.1248, cr_loss=0.3433, over 3334940.15 frames. ], batch size: 53, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:29:39,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=713743.3333333334, ans=0.125 2024-09-25 09:29:47,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=713743.3333333334, ans=0.0 2024-09-25 09:29:55,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2024-09-25 09:30:06,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=713790.0, ans=0.125 2024-09-25 09:30:30,749 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:30:32,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=713883.3333333334, ans=0.125 2024-09-25 09:30:50,192 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:30:51,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=713930.0, ans=0.125 2024-09-25 09:30:57,911 INFO [train.py:1198] (3/4) Epoch 40, batch 1050, loss[loss=0.1636, ctc_loss=0.1027, cr_loss=0.3044, over 17087.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1237, cr_loss=0.341, over 3337615.44 frames. ], batch size: 40, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:30:59,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=713976.6666666666, ans=0.125 2024-09-25 09:31:09,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=713976.6666666666, ans=0.0 2024-09-25 09:31:15,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2024-09-25 09:31:27,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=714023.3333333334, ans=0.0 2024-09-25 09:31:36,366 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:31:44,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714070.0, ans=0.1 2024-09-25 09:31:45,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=714070.0, ans=0.125 2024-09-25 09:31:47,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=714116.6666666666, ans=0.1 2024-09-25 09:31:51,954 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.281e+02 1.362e+02 1.480e+02 1.897e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-25 09:32:20,609 INFO [train.py:1198] (3/4) Epoch 40, batch 1100, loss[loss=0.149, ctc_loss=0.09377, cr_loss=0.276, over 16944.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1235, cr_loss=0.341, over 3342337.66 frames. ], batch size: 42, lr: 2.98e-03, grad_scale: 16.0 2024-09-25 09:32:35,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=714256.6666666666, ans=0.125 2024-09-25 09:33:06,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=714303.3333333334, ans=15.0 2024-09-25 09:33:21,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=714350.0, ans=0.2 2024-09-25 09:33:24,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=714396.6666666666, ans=0.125 2024-09-25 09:33:43,635 INFO [train.py:1198] (3/4) Epoch 40, batch 1150, loss[loss=0.2231, ctc_loss=0.1515, cr_loss=0.3577, over 17212.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1228, cr_loss=0.3392, over 3352113.71 frames. ], batch size: 55, lr: 2.98e-03, grad_scale: 16.0 2024-09-25 09:34:40,142 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.266e+02 1.369e+02 1.564e+02 2.152e+02, threshold=2.737e+02, percent-clipped=0.0 2024-09-25 09:34:51,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=714630.0, ans=0.0 2024-09-25 09:34:58,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=714630.0, ans=0.0 2024-09-25 09:35:08,859 INFO [train.py:1198] (3/4) Epoch 40, batch 1200, loss[loss=0.1638, ctc_loss=0.1031, cr_loss=0.3034, over 17123.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1228, cr_loss=0.3391, over 3341759.94 frames. ], batch size: 40, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:35:15,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=714676.6666666666, ans=10.0 2024-09-25 09:35:22,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=714676.6666666666, ans=0.0 2024-09-25 09:35:42,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=714770.0, ans=0.1 2024-09-25 09:35:44,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=714770.0, ans=0.025 2024-09-25 09:35:51,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=22.5 2024-09-25 09:36:18,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=714863.3333333334, ans=0.09899494936611666 2024-09-25 09:36:19,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=714863.3333333334, ans=0.125 2024-09-25 09:36:19,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=714863.3333333334, ans=0.0 2024-09-25 09:36:29,059 INFO [train.py:1198] (3/4) Epoch 40, batch 1250, loss[loss=0.2298, ctc_loss=0.1536, cr_loss=0.381, over 16941.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1231, cr_loss=0.3396, over 3341117.54 frames. ], batch size: 58, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:36:43,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=714910.0, ans=10.0 2024-09-25 09:36:55,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.90 vs. limit=15.0 2024-09-25 09:37:05,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=715003.3333333334, ans=0.125 2024-09-25 09:37:08,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=715003.3333333334, ans=0.2 2024-09-25 09:37:10,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=715003.3333333334, ans=0.125 2024-09-25 09:37:21,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=715050.0, ans=0.0 2024-09-25 09:37:22,756 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.288e+02 1.344e+02 1.458e+02 2.122e+02, threshold=2.687e+02, percent-clipped=0.0 2024-09-25 09:37:27,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=715050.0, ans=0.125 2024-09-25 09:37:51,163 INFO [train.py:1198] (3/4) Epoch 40, batch 1300, loss[loss=0.1877, ctc_loss=0.1203, cr_loss=0.3374, over 17065.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.124, cr_loss=0.3416, over 3349878.02 frames. ], batch size: 46, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:37:53,122 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:38:13,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=715190.0, ans=0.025 2024-09-25 09:38:26,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=715236.6666666666, ans=0.0 2024-09-25 09:38:30,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=715236.6666666666, ans=0.2 2024-09-25 09:38:32,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=715236.6666666666, ans=0.09899494936611666 2024-09-25 09:38:51,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.69 vs. limit=10.0 2024-09-25 09:38:59,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=715330.0, ans=0.125 2024-09-25 09:39:00,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=715330.0, ans=0.0 2024-09-25 09:39:03,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=715330.0, ans=0.07 2024-09-25 09:39:12,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=715330.0, ans=0.125 2024-09-25 09:39:19,071 INFO [train.py:1198] (3/4) Epoch 40, batch 1350, loss[loss=0.2019, ctc_loss=0.1307, cr_loss=0.3557, over 17152.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.124, cr_loss=0.3411, over 3339477.22 frames. ], batch size: 48, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:40:01,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=715470.0, ans=0.1 2024-09-25 09:40:10,085 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.269e+02 1.368e+02 1.487e+02 1.942e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 09:40:16,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=715516.6666666666, ans=0.2 2024-09-25 09:40:39,123 INFO [train.py:1198] (3/4) Epoch 40, batch 1400, loss[loss=0.1878, ctc_loss=0.1228, cr_loss=0.325, over 17296.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1232, cr_loss=0.3394, over 3344575.01 frames. ], batch size: 49, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:40:45,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=715610.0, ans=0.1 2024-09-25 09:41:21,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=715703.3333333334, ans=0.125 2024-09-25 09:41:31,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=715750.0, ans=0.125 2024-09-25 09:41:35,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=715750.0, ans=0.1 2024-09-25 09:41:54,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=715796.6666666666, ans=0.125 2024-09-25 09:42:02,343 INFO [train.py:1198] (3/4) Epoch 40, batch 1450, loss[loss=0.1524, ctc_loss=0.09558, cr_loss=0.2841, over 17092.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1224, cr_loss=0.3385, over 3348118.25 frames. ], batch size: 40, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:42:04,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2024-09-25 09:42:04,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=22.5 2024-09-25 09:42:20,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=715890.0, ans=0.125 2024-09-25 09:42:23,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.59 vs. limit=15.0 2024-09-25 09:42:26,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=715890.0, ans=0.125 2024-09-25 09:42:32,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=715936.6666666666, ans=0.125 2024-09-25 09:42:53,418 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.248e+02 1.311e+02 1.404e+02 2.777e+02, threshold=2.622e+02, percent-clipped=1.0 2024-09-25 09:43:09,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=716030.0, ans=0.125 2024-09-25 09:43:21,943 INFO [train.py:1198] (3/4) Epoch 40, batch 1500, loss[loss=0.1767, ctc_loss=0.1146, cr_loss=0.3104, over 17306.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.122, cr_loss=0.3378, over 3360952.58 frames. ], batch size: 46, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:43:46,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=716123.3333333334, ans=0.125 2024-09-25 09:43:55,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=716123.3333333334, ans=0.125 2024-09-25 09:44:12,400 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2024-09-25 09:44:36,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=22.5 2024-09-25 09:44:49,471 INFO [train.py:1198] (3/4) Epoch 40, batch 1550, loss[loss=0.1906, ctc_loss=0.1241, cr_loss=0.3326, over 17350.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1219, cr_loss=0.3371, over 3364844.42 frames. ], batch size: 48, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:44:51,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=716310.0, ans=0.125 2024-09-25 09:44:52,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=716310.0, ans=0.1 2024-09-25 09:45:10,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=716356.6666666666, ans=0.125 2024-09-25 09:45:17,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=716356.6666666666, ans=0.125 2024-09-25 09:45:36,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=716450.0, ans=0.2 2024-09-25 09:45:37,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=716450.0, ans=0.0 2024-09-25 09:45:39,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=716450.0, ans=0.125 2024-09-25 09:45:41,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.85 vs. limit=12.0 2024-09-25 09:45:42,231 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.286e+02 1.352e+02 1.488e+02 2.645e+02, threshold=2.703e+02, percent-clipped=1.0 2024-09-25 09:45:58,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=716496.6666666666, ans=0.1 2024-09-25 09:46:09,316 INFO [train.py:1198] (3/4) Epoch 40, batch 1600, loss[loss=0.2088, ctc_loss=0.136, cr_loss=0.3641, over 17254.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1228, cr_loss=0.3393, over 3359954.03 frames. ], batch size: 55, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:46:30,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=716590.0, ans=0.125 2024-09-25 09:47:08,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.18 vs. limit=15.0 2024-09-25 09:47:24,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=716730.0, ans=0.125 2024-09-25 09:47:32,136 INFO [train.py:1198] (3/4) Epoch 40, batch 1650, loss[loss=0.1965, ctc_loss=0.1262, cr_loss=0.3516, over 17302.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.122, cr_loss=0.3384, over 3372241.14 frames. ], batch size: 49, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:47:59,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=716823.3333333334, ans=0.1 2024-09-25 09:48:15,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=716870.0, ans=0.125 2024-09-25 09:48:27,242 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.303e+02 1.369e+02 1.457e+02 1.993e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-25 09:48:56,878 INFO [train.py:1198] (3/4) Epoch 40, batch 1700, loss[loss=0.1805, ctc_loss=0.1159, cr_loss=0.323, over 17287.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1224, cr_loss=0.3389, over 3375263.13 frames. ], batch size: 49, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:48:57,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=717010.0, ans=0.0 2024-09-25 09:49:05,691 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2024-09-25 09:49:15,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=717056.6666666666, ans=0.2 2024-09-25 09:49:20,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=717056.6666666666, ans=0.1 2024-09-25 09:49:36,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=717103.3333333334, ans=0.0 2024-09-25 09:49:55,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=717150.0, ans=0.2 2024-09-25 09:50:18,904 INFO [train.py:1198] (3/4) Epoch 40, batch 1750, loss[loss=0.1839, ctc_loss=0.1179, cr_loss=0.3301, over 17011.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.123, cr_loss=0.3404, over 3369249.74 frames. ], batch size: 44, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:50:36,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=717290.0, ans=0.0 2024-09-25 09:50:40,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=717290.0, ans=0.0 2024-09-25 09:50:43,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=717290.0, ans=0.1 2024-09-25 09:51:07,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=717383.3333333334, ans=0.07 2024-09-25 09:51:11,759 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.261e+02 1.332e+02 1.446e+02 2.217e+02, threshold=2.663e+02, percent-clipped=0.0 2024-09-25 09:51:15,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=717383.3333333334, ans=0.1 2024-09-25 09:51:17,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=12.0 2024-09-25 09:51:18,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=717383.3333333334, ans=0.125 2024-09-25 09:51:20,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=717383.3333333334, ans=0.0 2024-09-25 09:51:20,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=717383.3333333334, ans=0.125 2024-09-25 09:51:27,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=717430.0, ans=0.125 2024-09-25 09:51:32,894 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.61 vs. limit=12.0 2024-09-25 09:51:40,528 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:51:41,754 INFO [train.py:1198] (3/4) Epoch 40, batch 1800, loss[loss=0.2574, ctc_loss=0.1757, cr_loss=0.4086, over 14938.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1226, cr_loss=0.3396, over 3368833.52 frames. ], batch size: 88, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:52:07,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=717523.3333333334, ans=0.0 2024-09-25 09:52:36,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=717616.6666666666, ans=0.125 2024-09-25 09:52:36,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2024-09-25 09:53:01,967 INFO [train.py:1198] (3/4) Epoch 40, batch 1850, loss[loss=0.1562, ctc_loss=0.09842, cr_loss=0.2889, over 17045.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1231, cr_loss=0.3407, over 3369443.28 frames. ], batch size: 39, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:53:07,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=717710.0, ans=0.125 2024-09-25 09:53:08,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=717710.0, ans=0.025 2024-09-25 09:53:13,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=717710.0, ans=0.2 2024-09-25 09:53:41,107 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:53:47,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=717803.3333333334, ans=0.2 2024-09-25 09:53:49,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=717803.3333333334, ans=0.0 2024-09-25 09:54:00,036 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.246e+02 1.343e+02 1.431e+02 1.891e+02, threshold=2.686e+02, percent-clipped=0.0 2024-09-25 09:54:18,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=717896.6666666666, ans=0.125 2024-09-25 09:54:22,715 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=22.5 2024-09-25 09:54:25,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=717896.6666666666, ans=0.125 2024-09-25 09:54:29,719 INFO [train.py:1198] (3/4) Epoch 40, batch 1900, loss[loss=0.1634, ctc_loss=0.09816, cr_loss=0.3261, over 16224.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.123, cr_loss=0.3401, over 3367364.34 frames. ], batch size: 36, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:54:34,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=717943.3333333334, ans=0.125 2024-09-25 09:54:39,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=717943.3333333334, ans=0.125 2024-09-25 09:54:39,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=717943.3333333334, ans=0.0 2024-09-25 09:55:03,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=718036.6666666666, ans=0.125 2024-09-25 09:55:12,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2024-09-25 09:55:16,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=718083.3333333334, ans=0.0 2024-09-25 09:55:32,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=718130.0, ans=0.125 2024-09-25 09:55:37,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=718130.0, ans=0.125 2024-09-25 09:55:49,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.51 vs. limit=15.0 2024-09-25 09:55:49,716 INFO [train.py:1198] (3/4) Epoch 40, batch 1950, loss[loss=0.2091, ctc_loss=0.139, cr_loss=0.3504, over 16899.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1237, cr_loss=0.3405, over 3358904.39 frames. ], batch size: 58, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:55:58,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=718176.6666666666, ans=0.2 2024-09-25 09:56:14,786 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-25 09:56:24,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=718270.0, ans=0.1 2024-09-25 09:56:28,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=718270.0, ans=0.125 2024-09-25 09:56:39,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=718316.6666666666, ans=0.025 2024-09-25 09:56:44,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=718316.6666666666, ans=0.2 2024-09-25 09:56:45,247 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.303e+02 1.365e+02 1.463e+02 2.159e+02, threshold=2.730e+02, percent-clipped=0.0 2024-09-25 09:56:48,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=718316.6666666666, ans=0.125 2024-09-25 09:56:55,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=718363.3333333334, ans=0.07 2024-09-25 09:57:12,467 INFO [train.py:1198] (3/4) Epoch 40, batch 2000, loss[loss=0.2133, ctc_loss=0.1367, cr_loss=0.3828, over 16874.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1231, cr_loss=0.3397, over 3361804.78 frames. ], batch size: 58, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:57:17,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=718410.0, ans=0.0 2024-09-25 09:57:17,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=718410.0, ans=0.05 2024-09-25 09:57:28,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=718456.6666666666, ans=0.125 2024-09-25 09:57:41,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=718456.6666666666, ans=0.04949747468305833 2024-09-25 09:58:10,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=718550.0, ans=0.125 2024-09-25 09:58:16,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=718596.6666666666, ans=0.0 2024-09-25 09:58:17,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.08 vs. limit=10.0 2024-09-25 09:58:37,831 INFO [train.py:1198] (3/4) Epoch 40, batch 2050, loss[loss=0.1759, ctc_loss=0.1106, cr_loss=0.3262, over 17164.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1233, cr_loss=0.3404, over 3363615.48 frames. ], batch size: 45, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:58:57,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.50 vs. limit=10.0 2024-09-25 09:58:58,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=718690.0, ans=0.1 2024-09-25 09:58:59,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=718690.0, ans=0.025 2024-09-25 09:59:20,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=718736.6666666666, ans=0.125 2024-09-25 09:59:30,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=718783.3333333334, ans=0.125 2024-09-25 09:59:33,096 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.276e+02 1.355e+02 1.490e+02 2.224e+02, threshold=2.711e+02, percent-clipped=0.0 2024-09-25 09:59:35,592 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2024-09-25 09:59:57,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=718830.0, ans=0.125 2024-09-25 10:00:00,330 INFO [train.py:1198] (3/4) Epoch 40, batch 2100, loss[loss=0.1435, ctc_loss=0.08971, cr_loss=0.2691, over 17060.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1225, cr_loss=0.3394, over 3366524.81 frames. ], batch size: 39, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 10:00:08,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=718876.6666666666, ans=0.95 2024-09-25 10:00:15,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=718923.3333333334, ans=0.1 2024-09-25 10:00:21,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=718923.3333333334, ans=0.0 2024-09-25 10:00:28,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=718923.3333333334, ans=0.0 2024-09-25 10:00:54,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.54 vs. limit=22.5 2024-09-25 10:01:08,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=719063.3333333334, ans=0.2 2024-09-25 10:01:11,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=719063.3333333334, ans=0.2 2024-09-25 10:01:20,397 INFO [train.py:1198] (3/4) Epoch 40, batch 2150, loss[loss=0.2087, ctc_loss=0.1348, cr_loss=0.3694, over 16988.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1226, cr_loss=0.34, over 3367047.30 frames. ], batch size: 53, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 10:01:28,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-09-25 10:01:34,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=719110.0, ans=0.09899494936611666 2024-09-25 10:01:39,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=719156.6666666666, ans=0.125 2024-09-25 10:01:40,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=719156.6666666666, ans=0.2 2024-09-25 10:01:54,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2024-09-25 10:01:59,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2024-09-25 10:02:16,125 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.284e+02 1.361e+02 1.483e+02 2.210e+02, threshold=2.722e+02, percent-clipped=0.0 2024-09-25 10:02:43,177 INFO [train.py:1198] (3/4) Epoch 40, batch 2200, loss[loss=0.1804, ctc_loss=0.1157, cr_loss=0.3236, over 17192.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1228, cr_loss=0.3404, over 3358426.38 frames. ], batch size: 47, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 10:03:03,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=22.5 2024-09-25 10:03:45,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=719483.3333333334, ans=0.0 2024-09-25 10:03:47,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=719483.3333333334, ans=0.0 2024-09-25 10:04:08,299 INFO [train.py:1198] (3/4) Epoch 40, batch 2250, loss[loss=0.2371, ctc_loss=0.1552, cr_loss=0.4093, over 14826.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1229, cr_loss=0.3398, over 3355289.92 frames. ], batch size: 89, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 10:04:12,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2024-09-25 10:04:25,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=719623.3333333334, ans=0.2 2024-09-25 10:04:54,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=719670.0, ans=0.125 2024-09-25 10:05:04,026 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.272e+02 1.370e+02 1.440e+02 2.779e+02, threshold=2.741e+02, percent-clipped=1.0 2024-09-25 10:05:20,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=719763.3333333334, ans=0.125 2024-09-25 10:05:28,752 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.83 vs. limit=10.0 2024-09-25 10:05:31,319 INFO [train.py:1198] (3/4) Epoch 40, batch 2300, loss[loss=0.1892, ctc_loss=0.1214, cr_loss=0.3387, over 17359.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.1233, cr_loss=0.34, over 3355778.61 frames. ], batch size: 48, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:05:44,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2024-09-25 10:05:54,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=719856.6666666666, ans=0.2 2024-09-25 10:05:54,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=719856.6666666666, ans=0.125 2024-09-25 10:06:51,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2024-09-25 10:06:54,046 INFO [train.py:1198] (3/4) Epoch 40, batch 2350, loss[loss=0.2008, ctc_loss=0.1273, cr_loss=0.3675, over 17083.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1223, cr_loss=0.3383, over 3371055.35 frames. ], batch size: 49, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:07:35,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=720136.6666666666, ans=0.2 2024-09-25 10:07:46,592 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.272e+02 1.331e+02 1.456e+02 1.927e+02, threshold=2.662e+02, percent-clipped=0.0 2024-09-25 10:08:08,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-25 10:08:16,524 INFO [train.py:1198] (3/4) Epoch 40, batch 2400, loss[loss=0.2095, ctc_loss=0.1352, cr_loss=0.3717, over 17301.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1222, cr_loss=0.3382, over 3361403.81 frames. ], batch size: 49, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:08:40,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=720323.3333333334, ans=0.1 2024-09-25 10:08:56,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2024-09-25 10:09:06,886 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 10:09:13,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=720416.6666666666, ans=0.0 2024-09-25 10:09:21,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=720416.6666666666, ans=0.0 2024-09-25 10:09:41,565 INFO [train.py:1198] (3/4) Epoch 40, batch 2450, loss[loss=0.2212, ctc_loss=0.1422, cr_loss=0.3949, over 17007.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1224, cr_loss=0.3384, over 3358350.15 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:09:45,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=720510.0, ans=0.2 2024-09-25 10:09:48,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=720510.0, ans=0.125 2024-09-25 10:09:59,705 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 10:10:17,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=720603.3333333334, ans=0.125 2024-09-25 10:10:20,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=720603.3333333334, ans=0.125 2024-09-25 10:10:32,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=15.0 2024-09-25 10:10:34,966 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.276e+02 1.362e+02 1.464e+02 1.936e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-25 10:10:47,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=720696.6666666666, ans=0.2 2024-09-25 10:10:49,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=720696.6666666666, ans=0.125 2024-09-25 10:10:56,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=720696.6666666666, ans=0.125 2024-09-25 10:11:02,048 INFO [train.py:1198] (3/4) Epoch 40, batch 2500, loss[loss=0.1876, ctc_loss=0.1195, cr_loss=0.3409, over 17303.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1223, cr_loss=0.3388, over 3355017.53 frames. ], batch size: 46, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:11:28,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=720790.0, ans=0.125 2024-09-25 10:11:32,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=720790.0, ans=0.125 2024-09-25 10:12:13,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=720930.0, ans=0.035 2024-09-25 10:12:13,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=720930.0, ans=0.1 2024-09-25 10:12:21,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2024-09-25 10:12:24,940 INFO [train.py:1198] (3/4) Epoch 40, batch 2550, loss[loss=0.2103, ctc_loss=0.1367, cr_loss=0.368, over 16977.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.122, cr_loss=0.3387, over 3358800.51 frames. ], batch size: 53, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:12:39,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=721023.3333333334, ans=0.125 2024-09-25 10:12:41,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.24 vs. limit=10.0 2024-09-25 10:13:20,278 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.334e+02 1.455e+02 1.593e+02 2.101e+02, threshold=2.910e+02, percent-clipped=0.0 2024-09-25 10:13:23,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=721116.6666666666, ans=0.125 2024-09-25 10:13:27,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=721116.6666666666, ans=10.0 2024-09-25 10:13:47,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=721163.3333333334, ans=0.2 2024-09-25 10:13:50,329 INFO [train.py:1198] (3/4) Epoch 40, batch 2600, loss[loss=0.2129, ctc_loss=0.1407, cr_loss=0.3608, over 16044.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1214, cr_loss=0.3381, over 3365595.13 frames. ], batch size: 74, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:13:51,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2024-09-25 10:13:52,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=721210.0, ans=0.07 2024-09-25 10:14:11,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=721256.6666666666, ans=0.125 2024-09-25 10:14:20,975 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=22.5 2024-09-25 10:14:27,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.31 vs. limit=10.0 2024-09-25 10:14:28,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=721303.3333333334, ans=0.0 2024-09-25 10:14:47,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=721350.0, ans=0.125 2024-09-25 10:14:59,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.48 vs. limit=10.0 2024-09-25 10:15:01,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.76 vs. limit=22.5 2024-09-25 10:15:02,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=12.0 2024-09-25 10:15:13,249 INFO [train.py:1198] (3/4) Epoch 40, batch 2650, loss[loss=0.193, ctc_loss=0.1248, cr_loss=0.3411, over 17296.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1219, cr_loss=0.3384, over 3365302.67 frames. ], batch size: 46, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:15:15,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=721443.3333333334, ans=0.2 2024-09-25 10:15:26,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=721443.3333333334, ans=0.2 2024-09-25 10:15:29,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=721490.0, ans=0.125 2024-09-25 10:15:49,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=721536.6666666666, ans=0.125 2024-09-25 10:16:00,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=721583.3333333334, ans=0.0 2024-09-25 10:16:06,673 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.318e+02 1.393e+02 1.493e+02 1.834e+02, threshold=2.785e+02, percent-clipped=0.0 2024-09-25 10:16:07,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2024-09-25 10:16:15,746 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2024-09-25 10:16:36,436 INFO [train.py:1198] (3/4) Epoch 40, batch 2700, loss[loss=0.1908, ctc_loss=0.1223, cr_loss=0.3426, over 17219.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1219, cr_loss=0.3382, over 3367848.90 frames. ], batch size: 47, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:16:41,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=721676.6666666666, ans=0.125 2024-09-25 10:16:41,645 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 10:16:50,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-09-25 10:17:36,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2024-09-25 10:17:56,554 INFO [train.py:1198] (3/4) Epoch 40, batch 2750, loss[loss=0.1613, ctc_loss=0.1045, cr_loss=0.2843, over 16963.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1221, cr_loss=0.3387, over 3359703.20 frames. ], batch size: 42, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:18:00,236 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 10:18:18,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=721956.6666666666, ans=0.025 2024-09-25 10:18:28,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=721956.6666666666, ans=0.0 2024-09-25 10:18:38,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=722003.3333333334, ans=0.1 2024-09-25 10:18:43,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.77 vs. limit=10.0 2024-09-25 10:18:54,279 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.293e+02 1.370e+02 1.533e+02 1.979e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-25 10:19:24,034 INFO [train.py:1198] (3/4) Epoch 40, batch 2800, loss[loss=0.1984, ctc_loss=0.1269, cr_loss=0.3575, over 17026.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1227, cr_loss=0.3398, over 3359445.40 frames. ], batch size: 44, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:19:53,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2024-09-25 10:19:57,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=722236.6666666666, ans=0.025 2024-09-25 10:20:04,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=722236.6666666666, ans=0.125 2024-09-25 10:20:41,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=722330.0, ans=0.025 2024-09-25 10:20:44,375 INFO [train.py:1198] (3/4) Epoch 40, batch 2850, loss[loss=0.1728, ctc_loss=0.1119, cr_loss=0.3042, over 17254.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1226, cr_loss=0.3389, over 3361473.43 frames. ], batch size: 42, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:21:01,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=722423.3333333334, ans=0.125 2024-09-25 10:21:06,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.88 vs. limit=15.0 2024-09-25 10:21:17,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=722470.0, ans=0.125 2024-09-25 10:21:40,029 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.304e+02 1.350e+02 1.486e+02 2.860e+02, threshold=2.699e+02, percent-clipped=1.0 2024-09-25 10:21:45,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=722516.6666666666, ans=0.125 2024-09-25 10:22:07,591 INFO [train.py:1198] (3/4) Epoch 40, batch 2900, loss[loss=0.2058, ctc_loss=0.1318, cr_loss=0.37, over 17219.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1235, cr_loss=0.3408, over 3358409.45 frames. ], batch size: 55, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:22:44,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722703.3333333334, ans=0.1 2024-09-25 10:23:00,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=722750.0, ans=0.125 2024-09-25 10:23:09,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=722750.0, ans=0.07 2024-09-25 10:23:12,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=22.5 2024-09-25 10:23:28,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=722796.6666666666, ans=0.125 2024-09-25 10:23:32,770 INFO [train.py:1198] (3/4) Epoch 40, batch 2950, loss[loss=0.1928, ctc_loss=0.1258, cr_loss=0.3352, over 17042.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.124, cr_loss=0.3421, over 3357101.27 frames. ], batch size: 52, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:23:33,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.90 vs. limit=12.0 2024-09-25 10:23:36,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722843.3333333334, ans=0.1 2024-09-25 10:23:52,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=722890.0, ans=0.125 2024-09-25 10:23:58,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=722890.0, ans=0.025 2024-09-25 10:24:20,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=722936.6666666666, ans=0.125 2024-09-25 10:24:23,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=722983.3333333334, ans=0.0 2024-09-25 10:24:27,664 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.302e+02 1.375e+02 1.472e+02 2.387e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-25 10:24:45,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=723030.0, ans=0.0 2024-09-25 10:24:53,998 INFO [train.py:1198] (3/4) Epoch 40, batch 3000, loss[loss=0.1988, ctc_loss=0.1274, cr_loss=0.3567, over 17021.00 frames. ], tot_loss[loss=0.1925, ctc_loss=0.124, cr_loss=0.3424, over 3352785.49 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:24:53,999 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 10:25:09,254 INFO [train.py:1230] (3/4) Epoch 40, validation: loss=0.03571, ctc_loss=0.03571, cr_loss=9.785e-15, over 944034.00 frames. 2024-09-25 10:25:09,254 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 10:25:22,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=22.5 2024-09-25 10:25:40,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=723170.0, ans=0.125 2024-09-25 10:26:10,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2024-09-25 10:26:12,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=723263.3333333334, ans=0.2 2024-09-25 10:26:27,245 INFO [train.py:1198] (3/4) Epoch 40, batch 3050, loss[loss=0.17, ctc_loss=0.1076, cr_loss=0.3117, over 16232.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1245, cr_loss=0.3431, over 3348572.03 frames. ], batch size: 36, lr: 2.96e-03, grad_scale: 16.0 2024-09-25 10:26:38,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=723310.0, ans=0.125 2024-09-25 10:26:41,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=723356.6666666666, ans=0.0 2024-09-25 10:27:19,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723450.0, ans=0.1 2024-09-25 10:27:20,631 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.265e+02 1.358e+02 1.442e+02 1.711e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 10:27:38,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723496.6666666666, ans=0.1 2024-09-25 10:27:42,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=723496.6666666666, ans=0.0 2024-09-25 10:27:45,677 INFO [train.py:1198] (3/4) Epoch 40, batch 3100, loss[loss=0.2133, ctc_loss=0.1385, cr_loss=0.3741, over 17032.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1246, cr_loss=0.3425, over 3345843.27 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 16.0 2024-09-25 10:28:38,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=723683.3333333334, ans=0.05 2024-09-25 10:29:06,038 INFO [train.py:1198] (3/4) Epoch 40, batch 3150, loss[loss=0.1863, ctc_loss=0.1173, cr_loss=0.3449, over 17152.00 frames. ], tot_loss[loss=0.1937, ctc_loss=0.1249, cr_loss=0.3436, over 3352016.64 frames. ], batch size: 45, lr: 2.96e-03, grad_scale: 16.0 2024-09-25 10:29:12,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=723776.6666666666, ans=0.125 2024-09-25 10:29:28,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=723823.3333333334, ans=0.1 2024-09-25 10:29:48,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723870.0, ans=0.1 2024-09-25 10:29:52,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=723916.6666666666, ans=15.0 2024-09-25 10:30:00,972 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.278e+02 1.373e+02 1.497e+02 1.845e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-25 10:30:02,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=723916.6666666666, ans=0.0 2024-09-25 10:30:24,064 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=12.0 2024-09-25 10:30:24,822 INFO [train.py:1198] (3/4) Epoch 40, batch 3200, loss[loss=0.2061, ctc_loss=0.1293, cr_loss=0.3841, over 17222.00 frames. ], tot_loss[loss=0.1938, ctc_loss=0.1251, cr_loss=0.3438, over 3363114.13 frames. ], batch size: 50, lr: 2.96e-03, grad_scale: 16.0 2024-09-25 10:30:28,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.84 vs. limit=15.0 2024-09-25 10:31:13,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=724150.0, ans=0.0 2024-09-25 10:31:22,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=724150.0, ans=0.0 2024-09-25 10:31:37,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.30 vs. limit=12.0 2024-09-25 10:31:43,140 INFO [train.py:1198] (3/4) Epoch 40, batch 3250, loss[loss=0.1522, ctc_loss=0.09471, cr_loss=0.2874, over 16350.00 frames. ], tot_loss[loss=0.1942, ctc_loss=0.1253, cr_loss=0.3443, over 3363396.33 frames. ], batch size: 36, lr: 2.96e-03, grad_scale: 16.0 2024-09-25 10:31:54,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.71 vs. limit=15.0 2024-09-25 10:32:06,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=724290.0, ans=0.125 2024-09-25 10:32:42,130 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.319e+02 1.415e+02 1.534e+02 3.607e+02, threshold=2.830e+02, percent-clipped=1.0 2024-09-25 10:32:51,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=724430.0, ans=0.125 2024-09-25 10:33:05,493 INFO [train.py:1198] (3/4) Epoch 40, batch 3300, loss[loss=0.2042, ctc_loss=0.1328, cr_loss=0.3572, over 16800.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1253, cr_loss=0.3438, over 3358763.58 frames. ], batch size: 61, lr: 2.96e-03, grad_scale: 16.0 2024-09-25 10:33:37,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=724570.0, ans=0.125 2024-09-25 10:33:52,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=724616.6666666666, ans=0.125 2024-09-25 10:34:06,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=724616.6666666666, ans=0.0 2024-09-25 10:34:08,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=724663.3333333334, ans=0.125 2024-09-25 10:34:11,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=724663.3333333334, ans=0.125 2024-09-25 10:34:21,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=724663.3333333334, ans=0.125 2024-09-25 10:34:24,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=724710.0, ans=0.125 2024-09-25 10:34:25,460 INFO [train.py:1198] (3/4) Epoch 40, batch 3350, loss[loss=0.2194, ctc_loss=0.143, cr_loss=0.3818, over 15038.00 frames. ], tot_loss[loss=0.1938, ctc_loss=0.1251, cr_loss=0.3437, over 3351583.54 frames. ], batch size: 89, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:34:35,601 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2024-09-25 10:34:46,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=724756.6666666666, ans=0.0 2024-09-25 10:34:49,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=724756.6666666666, ans=0.125 2024-09-25 10:34:57,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-25 10:35:08,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=15.0 2024-09-25 10:35:19,910 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.291e+02 1.364e+02 1.477e+02 1.997e+02, threshold=2.727e+02, percent-clipped=0.0 2024-09-25 10:35:27,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-25 10:35:43,181 INFO [train.py:1198] (3/4) Epoch 40, batch 3400, loss[loss=0.1516, ctc_loss=0.09534, cr_loss=0.2812, over 17019.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1252, cr_loss=0.3439, over 3349226.75 frames. ], batch size: 39, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:35:45,584 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2024-09-25 10:35:59,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=724990.0, ans=0.125 2024-09-25 10:36:22,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=725036.6666666666, ans=0.0 2024-09-25 10:36:28,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=725083.3333333334, ans=0.125 2024-09-25 10:36:44,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=725130.0, ans=0.0 2024-09-25 10:36:47,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=725130.0, ans=0.125 2024-09-25 10:36:55,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=725130.0, ans=0.125 2024-09-25 10:37:01,221 INFO [train.py:1198] (3/4) Epoch 40, batch 3450, loss[loss=0.1711, ctc_loss=0.1084, cr_loss=0.3134, over 17260.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1244, cr_loss=0.3423, over 3351735.58 frames. ], batch size: 42, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:37:09,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=725176.6666666666, ans=0.09899494936611666 2024-09-25 10:37:11,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=725176.6666666666, ans=0.0 2024-09-25 10:37:33,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=725270.0, ans=0.125 2024-09-25 10:37:39,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=725270.0, ans=0.0 2024-09-25 10:37:45,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=725270.0, ans=0.025 2024-09-25 10:37:56,650 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.302e+02 1.377e+02 1.496e+02 2.659e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-25 10:38:14,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=12.0 2024-09-25 10:38:22,231 INFO [train.py:1198] (3/4) Epoch 40, batch 3500, loss[loss=0.1892, ctc_loss=0.1212, cr_loss=0.3398, over 17183.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1239, cr_loss=0.3411, over 3351967.82 frames. ], batch size: 45, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:38:38,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=725456.6666666666, ans=15.0 2024-09-25 10:39:01,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=725503.3333333334, ans=0.125 2024-09-25 10:39:15,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=725550.0, ans=0.025 2024-09-25 10:39:16,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=725550.0, ans=0.2 2024-09-25 10:39:19,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=725550.0, ans=0.125 2024-09-25 10:39:39,942 INFO [train.py:1198] (3/4) Epoch 40, batch 3550, loss[loss=0.1771, ctc_loss=0.113, cr_loss=0.3206, over 17013.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1245, cr_loss=0.3425, over 3348134.32 frames. ], batch size: 44, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:39:48,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=725643.3333333334, ans=0.95 2024-09-25 10:40:34,533 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.284e+02 1.346e+02 1.448e+02 2.075e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-25 10:40:58,150 INFO [train.py:1198] (3/4) Epoch 40, batch 3600, loss[loss=0.1721, ctc_loss=0.1117, cr_loss=0.3019, over 17021.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1228, cr_loss=0.3398, over 3348591.51 frames. ], batch size: 51, lr: 2.95e-03, grad_scale: 32.0 2024-09-25 10:41:02,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=725876.6666666666, ans=0.125 2024-09-25 10:41:04,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=725876.6666666666, ans=0.0 2024-09-25 10:42:17,428 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 10:42:18,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=726110.0, ans=0.5 2024-09-25 10:42:20,145 INFO [train.py:1198] (3/4) Epoch 40, batch 3650, loss[loss=0.2158, ctc_loss=0.1415, cr_loss=0.3711, over 16027.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1225, cr_loss=0.3394, over 3345958.71 frames. ], batch size: 74, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:42:39,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=726156.6666666666, ans=0.1 2024-09-25 10:42:59,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=726203.3333333334, ans=0.125 2024-09-25 10:43:19,107 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.298e+02 1.359e+02 1.454e+02 1.962e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-25 10:43:28,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=726296.6666666666, ans=0.125 2024-09-25 10:43:41,337 INFO [train.py:1198] (3/4) Epoch 40, batch 3700, loss[loss=0.1754, ctc_loss=0.1088, cr_loss=0.3327, over 17011.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1226, cr_loss=0.3394, over 3349375.31 frames. ], batch size: 39, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:43:43,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=726343.3333333334, ans=0.125 2024-09-25 10:44:13,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2024-09-25 10:44:20,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=726436.6666666666, ans=0.2 2024-09-25 10:44:33,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=726483.3333333334, ans=0.125 2024-09-25 10:44:38,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=726483.3333333334, ans=0.125 2024-09-25 10:44:57,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.77 vs. limit=10.0 2024-09-25 10:44:59,750 INFO [train.py:1198] (3/4) Epoch 40, batch 3750, loss[loss=0.205, ctc_loss=0.1316, cr_loss=0.3675, over 16954.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1223, cr_loss=0.3387, over 3347658.10 frames. ], batch size: 42, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:45:10,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=726576.6666666666, ans=0.125 2024-09-25 10:45:29,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=15.0 2024-09-25 10:45:35,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=726670.0, ans=0.95 2024-09-25 10:45:39,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.99 vs. limit=10.0 2024-09-25 10:45:53,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=726716.6666666666, ans=0.125 2024-09-25 10:45:55,149 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.300e+02 1.354e+02 1.476e+02 2.861e+02, threshold=2.708e+02, percent-clipped=2.0 2024-09-25 10:46:04,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=726763.3333333334, ans=0.2 2024-09-25 10:46:16,511 INFO [train.py:1198] (3/4) Epoch 40, batch 3800, loss[loss=0.1593, ctc_loss=0.1023, cr_loss=0.2846, over 16950.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1228, cr_loss=0.3387, over 3329738.35 frames. ], batch size: 42, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:46:22,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=22.5 2024-09-25 10:46:45,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=22.5 2024-09-25 10:46:47,788 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 10:46:51,374 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2024-09-25 10:46:52,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=726903.3333333334, ans=0.025 2024-09-25 10:47:10,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=726950.0, ans=0.125 2024-09-25 10:47:18,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=726996.6666666666, ans=0.125 2024-09-25 10:47:25,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=726996.6666666666, ans=0.025 2024-09-25 10:47:34,475 INFO [train.py:1198] (3/4) Epoch 40, batch 3850, loss[loss=0.1503, ctc_loss=0.0954, cr_loss=0.2744, over 16764.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1226, cr_loss=0.338, over 3296391.39 frames. ], batch size: 37, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:47:36,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=727043.3333333334, ans=0.125 2024-09-25 10:47:44,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=727043.3333333334, ans=0.125 2024-09-25 10:47:53,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=727090.0, ans=0.125 2024-09-25 10:48:08,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=727136.6666666666, ans=0.125 2024-09-25 10:48:30,296 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.319e+02 1.427e+02 1.593e+02 2.504e+02, threshold=2.853e+02, percent-clipped=0.0 2024-09-25 10:48:33,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=727183.3333333334, ans=0.0 2024-09-25 10:48:35,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.10 vs. limit=15.0 2024-09-25 10:48:38,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=727230.0, ans=0.05 2024-09-25 10:49:36,298 INFO [train.py:1198] (3/4) Epoch 41, batch 0, loss[loss=0.1817, ctc_loss=0.112, cr_loss=0.3486, over 17366.00 frames. ], tot_loss[loss=0.1817, ctc_loss=0.112, cr_loss=0.3486, over 17366.00 frames. ], batch size: 48, lr: 2.91e-03, grad_scale: 32.0 2024-09-25 10:49:36,298 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 10:49:51,731 INFO [train.py:1230] (3/4) Epoch 41, validation: loss=0.03537, ctc_loss=0.03537, cr_loss=1.035e-14, over 944034.00 frames. 2024-09-25 10:49:51,732 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 10:50:00,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2024-09-25 10:50:06,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=727304.6666666666, ans=0.0 2024-09-25 10:50:12,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=22.5 2024-09-25 10:51:15,278 INFO [train.py:1198] (3/4) Epoch 41, batch 50, loss[loss=0.1737, ctc_loss=0.1091, cr_loss=0.323, over 15797.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.121, cr_loss=0.3356, over 757772.40 frames. ], batch size: 35, lr: 2.91e-03, grad_scale: 32.0 2024-09-25 10:51:17,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=727491.3333333334, ans=0.0 2024-09-25 10:51:17,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=15.0 2024-09-25 10:51:34,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=727538.0, ans=0.1 2024-09-25 10:51:52,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=727584.6666666666, ans=0.0 2024-09-25 10:52:19,252 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.297e+02 1.379e+02 1.480e+02 1.921e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-25 10:52:21,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=727678.0, ans=0.125 2024-09-25 10:52:24,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=12.0 2024-09-25 10:52:29,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=727678.0, ans=0.0 2024-09-25 10:52:35,288 INFO [train.py:1198] (3/4) Epoch 41, batch 100, loss[loss=0.2003, ctc_loss=0.1281, cr_loss=0.3608, over 17099.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1238, cr_loss=0.3413, over 1329481.73 frames. ], batch size: 49, lr: 2.91e-03, grad_scale: 32.0 2024-09-25 10:52:37,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=727724.6666666666, ans=0.2 2024-09-25 10:52:49,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=727771.3333333334, ans=0.125 2024-09-25 10:53:14,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.72 vs. limit=22.5 2024-09-25 10:53:45,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=727911.3333333334, ans=0.07 2024-09-25 10:53:59,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=727958.0, ans=0.2 2024-09-25 10:54:00,662 INFO [train.py:1198] (3/4) Epoch 41, batch 150, loss[loss=0.1597, ctc_loss=0.09918, cr_loss=0.3026, over 17086.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1226, cr_loss=0.3397, over 1789295.79 frames. ], batch size: 40, lr: 2.91e-03, grad_scale: 32.0 2024-09-25 10:54:30,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=728004.6666666666, ans=0.0 2024-09-25 10:54:56,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=728098.0, ans=0.0 2024-09-25 10:55:01,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=728098.0, ans=0.05 2024-09-25 10:55:06,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=22.5 2024-09-25 10:55:10,424 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.287e+02 1.355e+02 1.476e+02 1.968e+02, threshold=2.710e+02, percent-clipped=0.0 2024-09-25 10:55:15,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=728144.6666666666, ans=0.125 2024-09-25 10:55:27,696 INFO [train.py:1198] (3/4) Epoch 41, batch 200, loss[loss=0.2042, ctc_loss=0.1314, cr_loss=0.3641, over 17316.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1228, cr_loss=0.3407, over 2149401.96 frames. ], batch size: 51, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 10:55:40,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=728191.3333333334, ans=0.025 2024-09-25 10:56:10,278 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=22.5 2024-09-25 10:56:25,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=728331.3333333334, ans=0.125 2024-09-25 10:56:47,737 INFO [train.py:1198] (3/4) Epoch 41, batch 250, loss[loss=0.1816, ctc_loss=0.1151, cr_loss=0.3324, over 17215.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1228, cr_loss=0.3409, over 2424452.99 frames. ], batch size: 50, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 10:57:38,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=12.0 2024-09-25 10:57:45,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=728564.6666666666, ans=0.125 2024-09-25 10:57:53,022 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.260e+02 1.343e+02 1.433e+02 1.845e+02, threshold=2.685e+02, percent-clipped=0.0 2024-09-25 10:57:53,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=728611.3333333334, ans=0.2 2024-09-25 10:57:54,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=728611.3333333334, ans=0.0 2024-09-25 10:57:58,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=728611.3333333334, ans=0.5 2024-09-25 10:58:07,362 INFO [train.py:1198] (3/4) Epoch 41, batch 300, loss[loss=0.1534, ctc_loss=0.09625, cr_loss=0.2855, over 17063.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1217, cr_loss=0.338, over 2636022.95 frames. ], batch size: 39, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 10:58:12,802 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.81 vs. limit=15.0 2024-09-25 10:58:12,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.06 vs. limit=10.0 2024-09-25 10:58:21,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=728704.6666666666, ans=0.0 2024-09-25 10:58:23,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=728704.6666666666, ans=0.07 2024-09-25 10:58:36,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=728704.6666666666, ans=0.125 2024-09-25 10:58:50,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=728751.3333333334, ans=0.125 2024-09-25 10:58:56,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=728751.3333333334, ans=0.0 2024-09-25 10:59:03,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=728798.0, ans=0.125 2024-09-25 10:59:03,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=728798.0, ans=0.0 2024-09-25 10:59:36,175 INFO [train.py:1198] (3/4) Epoch 41, batch 350, loss[loss=0.2026, ctc_loss=0.1292, cr_loss=0.367, over 17019.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1223, cr_loss=0.339, over 2789668.67 frames. ], batch size: 52, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 10:59:36,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=728891.3333333334, ans=0.125 2024-09-25 10:59:59,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=728938.0, ans=0.1 2024-09-25 11:00:02,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=728938.0, ans=0.0 2024-09-25 11:00:21,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=728984.6666666666, ans=0.125 2024-09-25 11:00:31,221 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2024-09-25 11:00:37,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.92 vs. limit=6.0 2024-09-25 11:00:44,743 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.270e+02 1.343e+02 1.483e+02 2.666e+02, threshold=2.685e+02, percent-clipped=0.0 2024-09-25 11:00:46,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=729078.0, ans=0.125 2024-09-25 11:00:59,199 INFO [train.py:1198] (3/4) Epoch 41, batch 400, loss[loss=0.1954, ctc_loss=0.1243, cr_loss=0.3557, over 17212.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1225, cr_loss=0.3394, over 2911787.77 frames. ], batch size: 47, lr: 2.91e-03, grad_scale: 32.0 2024-09-25 11:01:11,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.84 vs. limit=10.0 2024-09-25 11:01:13,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=729171.3333333334, ans=10.0 2024-09-25 11:01:26,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=729171.3333333334, ans=10.0 2024-09-25 11:01:28,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=729171.3333333334, ans=0.0 2024-09-25 11:01:31,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=729218.0, ans=0.0 2024-09-25 11:01:41,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=729218.0, ans=0.0 2024-09-25 11:01:45,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=729264.6666666666, ans=0.0 2024-09-25 11:02:04,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=729311.3333333334, ans=0.2 2024-09-25 11:02:14,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=729311.3333333334, ans=0.125 2024-09-25 11:02:17,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=729358.0, ans=0.125 2024-09-25 11:02:18,862 INFO [train.py:1198] (3/4) Epoch 41, batch 450, loss[loss=0.1846, ctc_loss=0.1179, cr_loss=0.3336, over 17230.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1231, cr_loss=0.3403, over 3012345.25 frames. ], batch size: 47, lr: 2.91e-03, grad_scale: 32.0 2024-09-25 11:02:39,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2024-09-25 11:02:59,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=729451.3333333334, ans=0.0 2024-09-25 11:03:26,324 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.254e+02 1.327e+02 1.463e+02 1.935e+02, threshold=2.654e+02, percent-clipped=0.0 2024-09-25 11:03:41,631 INFO [train.py:1198] (3/4) Epoch 41, batch 500, loss[loss=0.1915, ctc_loss=0.12, cr_loss=0.3572, over 17293.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1227, cr_loss=0.3396, over 3099685.03 frames. ], batch size: 46, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 11:04:43,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=729731.3333333334, ans=0.0 2024-09-25 11:04:48,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=729731.3333333334, ans=0.0 2024-09-25 11:05:09,616 INFO [train.py:1198] (3/4) Epoch 41, batch 550, loss[loss=0.1637, ctc_loss=0.1031, cr_loss=0.3029, over 17132.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1234, cr_loss=0.3408, over 3148290.87 frames. ], batch size: 48, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 11:05:48,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.76 vs. limit=12.0 2024-09-25 11:06:18,378 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.304e+02 1.377e+02 1.509e+02 2.527e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-25 11:06:22,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=730011.3333333334, ans=0.125 2024-09-25 11:06:25,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=730011.3333333334, ans=0.125 2024-09-25 11:06:29,792 INFO [train.py:1198] (3/4) Epoch 41, batch 600, loss[loss=0.1844, ctc_loss=0.1189, cr_loss=0.3274, over 17036.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.124, cr_loss=0.342, over 3203732.07 frames. ], batch size: 44, lr: 2.91e-03, grad_scale: 8.0 2024-09-25 11:06:33,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=730058.0, ans=0.0 2024-09-25 11:06:36,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730058.0, ans=0.1 2024-09-25 11:06:38,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=730058.0, ans=0.0 2024-09-25 11:06:52,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=730104.6666666666, ans=0.2 2024-09-25 11:07:27,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=730198.0, ans=0.0 2024-09-25 11:07:31,501 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.41 vs. limit=10.0 2024-09-25 11:07:37,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=730244.6666666666, ans=0.0 2024-09-25 11:07:49,973 INFO [train.py:1198] (3/4) Epoch 41, batch 650, loss[loss=0.1746, ctc_loss=0.1085, cr_loss=0.3306, over 17042.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1236, cr_loss=0.3416, over 3234912.28 frames. ], batch size: 39, lr: 2.91e-03, grad_scale: 8.0 2024-09-25 11:08:56,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=730431.3333333334, ans=0.125 2024-09-25 11:09:06,925 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.289e+02 1.341e+02 1.440e+02 1.849e+02, threshold=2.683e+02, percent-clipped=0.0 2024-09-25 11:09:18,186 INFO [train.py:1198] (3/4) Epoch 41, batch 700, loss[loss=0.2051, ctc_loss=0.1361, cr_loss=0.345, over 17303.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.124, cr_loss=0.342, over 3254580.14 frames. ], batch size: 49, lr: 2.91e-03, grad_scale: 8.0 2024-09-25 11:09:33,170 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2024-09-25 11:09:33,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.48 vs. limit=15.0 2024-09-25 11:09:38,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2024-09-25 11:09:49,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=730618.0, ans=0.2 2024-09-25 11:10:03,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730618.0, ans=0.1 2024-09-25 11:10:28,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=730711.3333333334, ans=0.2 2024-09-25 11:10:40,885 INFO [train.py:1198] (3/4) Epoch 41, batch 750, loss[loss=0.1795, ctc_loss=0.1148, cr_loss=0.3234, over 17267.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1242, cr_loss=0.3425, over 3283407.54 frames. ], batch size: 44, lr: 2.91e-03, grad_scale: 8.0 2024-09-25 11:10:44,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=730758.0, ans=0.125 2024-09-25 11:10:47,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=730758.0, ans=0.125 2024-09-25 11:11:00,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=730804.6666666666, ans=0.125 2024-09-25 11:11:46,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730944.6666666666, ans=0.1 2024-09-25 11:11:49,459 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.291e+02 1.357e+02 1.455e+02 2.788e+02, threshold=2.714e+02, percent-clipped=1.0 2024-09-25 11:11:53,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=730944.6666666666, ans=0.125 2024-09-25 11:12:00,722 INFO [train.py:1198] (3/4) Epoch 41, batch 800, loss[loss=0.1751, ctc_loss=0.1121, cr_loss=0.3148, over 17317.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1238, cr_loss=0.3421, over 3309788.21 frames. ], batch size: 46, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 11:12:02,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=730991.3333333334, ans=0.125 2024-09-25 11:12:10,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=730991.3333333334, ans=0.125 2024-09-25 11:12:37,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=731084.6666666666, ans=0.2 2024-09-25 11:13:07,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-09-25 11:13:18,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=731178.0, ans=0.1 2024-09-25 11:13:21,006 INFO [train.py:1198] (3/4) Epoch 41, batch 850, loss[loss=0.1849, ctc_loss=0.1179, cr_loss=0.3349, over 17286.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1236, cr_loss=0.3417, over 3324325.13 frames. ], batch size: 49, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 11:13:50,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=731271.3333333334, ans=6.0 2024-09-25 11:14:36,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=731411.3333333334, ans=0.1 2024-09-25 11:14:37,172 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.274e+02 1.366e+02 1.473e+02 2.977e+02, threshold=2.732e+02, percent-clipped=1.0 2024-09-25 11:14:38,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0 2024-09-25 11:14:39,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=731411.3333333334, ans=0.0 2024-09-25 11:14:39,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=731411.3333333334, ans=0.125 2024-09-25 11:14:48,459 INFO [train.py:1198] (3/4) Epoch 41, batch 900, loss[loss=0.1546, ctc_loss=0.09726, cr_loss=0.2869, over 17262.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1238, cr_loss=0.342, over 3335768.05 frames. ], batch size: 42, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:15:29,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=731551.3333333334, ans=0.125 2024-09-25 11:15:44,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=731598.0, ans=0.125 2024-09-25 11:15:47,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=731598.0, ans=0.125 2024-09-25 11:15:50,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=731598.0, ans=0.1 2024-09-25 11:16:10,880 INFO [train.py:1198] (3/4) Epoch 41, batch 950, loss[loss=0.1873, ctc_loss=0.1215, cr_loss=0.3292, over 17221.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1237, cr_loss=0.3424, over 3338960.15 frames. ], batch size: 50, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:16:29,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=731738.0, ans=0.125 2024-09-25 11:16:36,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=731738.0, ans=0.025 2024-09-25 11:16:49,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=731784.6666666666, ans=0.5 2024-09-25 11:17:02,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=731831.3333333334, ans=0.125 2024-09-25 11:17:19,942 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.310e+02 1.391e+02 1.460e+02 2.998e+02, threshold=2.782e+02, percent-clipped=1.0 2024-09-25 11:17:31,155 INFO [train.py:1198] (3/4) Epoch 41, batch 1000, loss[loss=0.1762, ctc_loss=0.1147, cr_loss=0.3076, over 17092.00 frames. ], tot_loss[loss=0.1925, ctc_loss=0.124, cr_loss=0.3426, over 3340116.50 frames. ], batch size: 49, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:17:50,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=731971.3333333334, ans=0.035 2024-09-25 11:18:10,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=22.5 2024-09-25 11:18:39,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=732111.3333333334, ans=0.125 2024-09-25 11:18:52,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=732111.3333333334, ans=0.2 2024-09-25 11:18:56,468 INFO [train.py:1198] (3/4) Epoch 41, batch 1050, loss[loss=0.2034, ctc_loss=0.1295, cr_loss=0.3695, over 17010.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1234, cr_loss=0.3422, over 3337162.99 frames. ], batch size: 51, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:18:59,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=732158.0, ans=0.125 2024-09-25 11:19:13,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=732204.6666666666, ans=0.0 2024-09-25 11:19:15,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=732204.6666666666, ans=0.0 2024-09-25 11:19:59,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=15.0 2024-09-25 11:20:10,107 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.269e+02 1.332e+02 1.413e+02 1.822e+02, threshold=2.664e+02, percent-clipped=0.0 2024-09-25 11:20:21,422 INFO [train.py:1198] (3/4) Epoch 41, batch 1100, loss[loss=0.1621, ctc_loss=0.09963, cr_loss=0.3122, over 17032.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1225, cr_loss=0.3406, over 3351118.01 frames. ], batch size: 39, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:20:58,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=732484.6666666666, ans=0.0 2024-09-25 11:21:19,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=732531.3333333334, ans=0.09899494936611666 2024-09-25 11:21:37,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=732578.0, ans=0.0 2024-09-25 11:21:41,757 INFO [train.py:1198] (3/4) Epoch 41, batch 1150, loss[loss=0.2047, ctc_loss=0.1348, cr_loss=0.3493, over 16932.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1226, cr_loss=0.3397, over 3353105.64 frames. ], batch size: 58, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:21:43,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=732624.6666666666, ans=0.125 2024-09-25 11:21:46,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=732624.6666666666, ans=0.04949747468305833 2024-09-25 11:21:55,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.83 vs. limit=10.0 2024-09-25 11:22:03,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=732671.3333333334, ans=0.025 2024-09-25 11:22:07,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=732671.3333333334, ans=0.0 2024-09-25 11:22:25,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=732718.0, ans=0.125 2024-09-25 11:22:41,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=732764.6666666666, ans=0.125 2024-09-25 11:22:50,423 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.312e+02 1.375e+02 1.453e+02 2.138e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-25 11:23:01,757 INFO [train.py:1198] (3/4) Epoch 41, batch 1200, loss[loss=0.1936, ctc_loss=0.1219, cr_loss=0.3584, over 17064.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1227, cr_loss=0.34, over 3352859.88 frames. ], batch size: 46, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:23:13,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=732858.0, ans=0.125 2024-09-25 11:23:25,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=732904.6666666666, ans=0.0 2024-09-25 11:23:44,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=732951.3333333334, ans=0.1 2024-09-25 11:23:49,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=12.0 2024-09-25 11:23:52,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=732951.3333333334, ans=0.0 2024-09-25 11:24:10,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=732998.0, ans=0.0 2024-09-25 11:24:18,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=733044.6666666666, ans=0.0 2024-09-25 11:24:25,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.74 vs. limit=10.0 2024-09-25 11:24:29,677 INFO [train.py:1198] (3/4) Epoch 41, batch 1250, loss[loss=0.2008, ctc_loss=0.1295, cr_loss=0.3563, over 17347.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1235, cr_loss=0.3414, over 3364572.15 frames. ], batch size: 48, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:25:07,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=733184.6666666666, ans=0.2 2024-09-25 11:25:12,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2024-09-25 11:25:31,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=733231.3333333334, ans=0.125 2024-09-25 11:25:40,678 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.293e+02 1.355e+02 1.454e+02 2.054e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-25 11:25:47,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=733278.0, ans=0.125 2024-09-25 11:25:52,137 INFO [train.py:1198] (3/4) Epoch 41, batch 1300, loss[loss=0.2327, ctc_loss=0.1558, cr_loss=0.3847, over 16042.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1238, cr_loss=0.3421, over 3364451.54 frames. ], batch size: 74, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:25:57,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=733324.6666666666, ans=0.2 2024-09-25 11:26:02,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.03 vs. limit=15.0 2024-09-25 11:26:05,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=733324.6666666666, ans=0.0 2024-09-25 11:26:06,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=733371.3333333334, ans=0.0 2024-09-25 11:26:21,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=733371.3333333334, ans=0.125 2024-09-25 11:26:30,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=733418.0, ans=0.125 2024-09-25 11:26:37,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=22.5 2024-09-25 11:26:42,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=733464.6666666666, ans=0.95 2024-09-25 11:26:43,676 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 11:27:12,372 INFO [train.py:1198] (3/4) Epoch 41, batch 1350, loss[loss=0.2014, ctc_loss=0.1327, cr_loss=0.3431, over 16780.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1231, cr_loss=0.3402, over 3365521.63 frames. ], batch size: 61, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:27:12,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=733558.0, ans=0.07 2024-09-25 11:27:33,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=733604.6666666666, ans=0.0 2024-09-25 11:27:38,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=733604.6666666666, ans=0.07 2024-09-25 11:28:06,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=733698.0, ans=0.0 2024-09-25 11:28:20,788 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.331e+02 1.402e+02 1.512e+02 1.864e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-25 11:28:21,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.14 vs. limit=15.0 2024-09-25 11:28:37,188 INFO [train.py:1198] (3/4) Epoch 41, batch 1400, loss[loss=0.1891, ctc_loss=0.1235, cr_loss=0.3278, over 15845.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1229, cr_loss=0.3406, over 3356506.34 frames. ], batch size: 74, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:28:45,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=733791.3333333334, ans=0.125 2024-09-25 11:29:04,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=733838.0, ans=0.0 2024-09-25 11:29:44,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=733978.0, ans=0.125 2024-09-25 11:29:52,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=733978.0, ans=0.0 2024-09-25 11:30:01,595 INFO [train.py:1198] (3/4) Epoch 41, batch 1450, loss[loss=0.2136, ctc_loss=0.1376, cr_loss=0.3799, over 16481.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1225, cr_loss=0.3397, over 3355592.69 frames. ], batch size: 66, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:30:38,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=734118.0, ans=0.2 2024-09-25 11:30:41,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=734118.0, ans=0.125 2024-09-25 11:30:49,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=734164.6666666666, ans=0.125 2024-09-25 11:30:58,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2024-09-25 11:31:00,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=734164.6666666666, ans=0.0 2024-09-25 11:31:10,082 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.281e+02 1.374e+02 1.455e+02 2.816e+02, threshold=2.747e+02, percent-clipped=1.0 2024-09-25 11:31:21,398 INFO [train.py:1198] (3/4) Epoch 41, batch 1500, loss[loss=0.1663, ctc_loss=0.1037, cr_loss=0.3129, over 17063.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1233, cr_loss=0.3411, over 3351959.71 frames. ], batch size: 39, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:31:39,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=734304.6666666666, ans=0.0 2024-09-25 11:31:55,321 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 11:32:07,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=734398.0, ans=0.125 2024-09-25 11:32:08,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0 2024-09-25 11:32:20,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=734398.0, ans=0.04949747468305833 2024-09-25 11:32:20,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=734398.0, ans=0.125 2024-09-25 11:32:38,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=15.0 2024-09-25 11:32:39,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734491.3333333334, ans=0.1 2024-09-25 11:32:41,151 INFO [train.py:1198] (3/4) Epoch 41, batch 1550, loss[loss=0.1894, ctc_loss=0.1227, cr_loss=0.3338, over 17344.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1228, cr_loss=0.3398, over 3343122.63 frames. ], batch size: 48, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:32:57,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=734538.0, ans=0.125 2024-09-25 11:33:14,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2024-09-25 11:33:25,129 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 11:33:31,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=734584.6666666666, ans=0.0 2024-09-25 11:33:53,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=734678.0, ans=0.0 2024-09-25 11:33:58,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=734678.0, ans=0.125 2024-09-25 11:33:59,385 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.278e+02 1.349e+02 1.424e+02 1.781e+02, threshold=2.698e+02, percent-clipped=0.0 2024-09-25 11:34:01,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=734678.0, ans=0.125 2024-09-25 11:34:03,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=734678.0, ans=0.125 2024-09-25 11:34:07,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=734724.6666666666, ans=0.0 2024-09-25 11:34:09,079 INFO [train.py:1198] (3/4) Epoch 41, batch 1600, loss[loss=0.1886, ctc_loss=0.1217, cr_loss=0.3345, over 17084.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1225, cr_loss=0.3391, over 3351504.15 frames. ], batch size: 49, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:34:50,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=734818.0, ans=0.125 2024-09-25 11:35:12,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=734864.6666666666, ans=0.0 2024-09-25 11:35:31,975 INFO [train.py:1198] (3/4) Epoch 41, batch 1650, loss[loss=0.1941, ctc_loss=0.1249, cr_loss=0.3464, over 17020.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1223, cr_loss=0.3382, over 3346306.09 frames. ], batch size: 51, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:35:41,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=734958.0, ans=0.2 2024-09-25 11:36:42,043 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.308e+02 1.358e+02 1.444e+02 2.176e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 11:36:42,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=735144.6666666666, ans=0.125 2024-09-25 11:36:51,846 INFO [train.py:1198] (3/4) Epoch 41, batch 1700, loss[loss=0.1852, ctc_loss=0.1183, cr_loss=0.3345, over 17044.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1223, cr_loss=0.3384, over 3351171.41 frames. ], batch size: 39, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:37:19,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=735238.0, ans=0.125 2024-09-25 11:37:27,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=735284.6666666666, ans=0.0 2024-09-25 11:37:34,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.25 vs. limit=12.0 2024-09-25 11:38:04,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=735378.0, ans=0.0 2024-09-25 11:38:12,454 INFO [train.py:1198] (3/4) Epoch 41, batch 1750, loss[loss=0.2268, ctc_loss=0.1486, cr_loss=0.3911, over 15139.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1228, cr_loss=0.3388, over 3348391.73 frames. ], batch size: 89, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:38:28,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735424.6666666666, ans=0.1 2024-09-25 11:38:30,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=735424.6666666666, ans=0.2 2024-09-25 11:38:43,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=735471.3333333334, ans=0.125 2024-09-25 11:39:10,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-09-25 11:39:22,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=735611.3333333334, ans=0.09899494936611666 2024-09-25 11:39:30,212 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.294e+02 1.365e+02 1.443e+02 2.325e+02, threshold=2.730e+02, percent-clipped=0.0 2024-09-25 11:39:35,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=735611.3333333334, ans=0.0 2024-09-25 11:39:39,720 INFO [train.py:1198] (3/4) Epoch 41, batch 1800, loss[loss=0.1889, ctc_loss=0.1189, cr_loss=0.3498, over 17010.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1226, cr_loss=0.3382, over 3341607.33 frames. ], batch size: 44, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:40:31,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2024-09-25 11:40:32,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=735798.0, ans=0.125 2024-09-25 11:40:43,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=735798.0, ans=0.125 2024-09-25 11:41:02,241 INFO [train.py:1198] (3/4) Epoch 41, batch 1850, loss[loss=0.2013, ctc_loss=0.1319, cr_loss=0.3468, over 17036.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1232, cr_loss=0.3401, over 3345284.64 frames. ], batch size: 56, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:41:08,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=12.0 2024-09-25 11:41:24,814 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 11:41:42,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735984.6666666666, ans=0.1 2024-09-25 11:41:45,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735984.6666666666, ans=0.1 2024-09-25 11:42:13,780 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.264e+02 1.364e+02 1.473e+02 1.819e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-25 11:42:21,753 INFO [train.py:1198] (3/4) Epoch 41, batch 1900, loss[loss=0.1878, ctc_loss=0.1196, cr_loss=0.3411, over 17150.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1226, cr_loss=0.3396, over 3352342.66 frames. ], batch size: 45, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:42:25,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=736124.6666666666, ans=0.0 2024-09-25 11:42:59,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=736218.0, ans=0.05 2024-09-25 11:43:10,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=736264.6666666666, ans=0.07 2024-09-25 11:43:15,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=736264.6666666666, ans=0.125 2024-09-25 11:43:31,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=736311.3333333334, ans=0.0 2024-09-25 11:43:32,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=4.71 vs. limit=15.0 2024-09-25 11:43:50,148 INFO [train.py:1198] (3/4) Epoch 41, batch 1950, loss[loss=0.1982, ctc_loss=0.1248, cr_loss=0.3667, over 17062.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1226, cr_loss=0.34, over 3349294.48 frames. ], batch size: 46, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:43:56,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736358.0, ans=0.1 2024-09-25 11:43:58,645 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 11:44:40,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=736498.0, ans=0.0 2024-09-25 11:44:56,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=736544.6666666666, ans=0.125 2024-09-25 11:45:05,169 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.301e+02 1.403e+02 1.536e+02 5.268e+02, threshold=2.807e+02, percent-clipped=2.0 2024-09-25 11:45:13,265 INFO [train.py:1198] (3/4) Epoch 41, batch 2000, loss[loss=0.1722, ctc_loss=0.1071, cr_loss=0.3254, over 17209.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.123, cr_loss=0.3413, over 3354091.63 frames. ], batch size: 47, lr: 2.89e-03, grad_scale: 32.0 2024-09-25 11:45:36,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=736638.0, ans=0.125 2024-09-25 11:45:37,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=736638.0, ans=0.125 2024-09-25 11:45:56,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=736684.6666666666, ans=0.125 2024-09-25 11:45:58,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=736684.6666666666, ans=0.125 2024-09-25 11:46:11,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=736731.3333333334, ans=0.2 2024-09-25 11:46:34,058 INFO [train.py:1198] (3/4) Epoch 41, batch 2050, loss[loss=0.1578, ctc_loss=0.09954, cr_loss=0.2915, over 17256.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1229, cr_loss=0.3411, over 3355063.20 frames. ], batch size: 42, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:46:34,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=736824.6666666666, ans=0.2 2024-09-25 11:46:39,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=736824.6666666666, ans=0.2 2024-09-25 11:46:53,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=736871.3333333334, ans=0.025 2024-09-25 11:47:08,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.87 vs. limit=12.0 2024-09-25 11:47:11,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=736918.0, ans=0.1 2024-09-25 11:47:40,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=737011.3333333334, ans=0.0 2024-09-25 11:47:47,792 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.289e+02 1.380e+02 1.485e+02 2.059e+02, threshold=2.761e+02, percent-clipped=0.0 2024-09-25 11:47:54,264 INFO [train.py:1198] (3/4) Epoch 41, batch 2100, loss[loss=0.2105, ctc_loss=0.1393, cr_loss=0.3562, over 15792.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1232, cr_loss=0.3412, over 3349748.44 frames. ], batch size: 74, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:47:59,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=737058.0, ans=0.125 2024-09-25 11:47:59,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=737058.0, ans=0.125 2024-09-25 11:49:09,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=737244.6666666666, ans=0.125 2024-09-25 11:49:21,660 INFO [train.py:1198] (3/4) Epoch 41, batch 2150, loss[loss=0.2276, ctc_loss=0.1555, cr_loss=0.3606, over 11588.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1241, cr_loss=0.3424, over 3332399.18 frames. ], batch size: 123, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:49:23,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=737291.3333333334, ans=0.025 2024-09-25 11:49:33,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=737291.3333333334, ans=0.125 2024-09-25 11:49:36,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=737338.0, ans=0.125 2024-09-25 11:49:52,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2024-09-25 11:50:10,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737431.3333333334, ans=0.1 2024-09-25 11:50:20,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=737431.3333333334, ans=0.125 2024-09-25 11:50:20,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=737431.3333333334, ans=0.125 2024-09-25 11:50:20,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2024-09-25 11:50:36,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=737478.0, ans=0.015 2024-09-25 11:50:37,767 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.274e+02 1.356e+02 1.503e+02 3.384e+02, threshold=2.711e+02, percent-clipped=1.0 2024-09-25 11:50:44,083 INFO [train.py:1198] (3/4) Epoch 41, batch 2200, loss[loss=0.1762, ctc_loss=0.1124, cr_loss=0.319, over 17205.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1231, cr_loss=0.3401, over 3343166.60 frames. ], batch size: 50, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:51:48,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=737711.3333333334, ans=0.2 2024-09-25 11:52:04,269 INFO [train.py:1198] (3/4) Epoch 41, batch 2250, loss[loss=0.2091, ctc_loss=0.1346, cr_loss=0.3726, over 16588.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1231, cr_loss=0.3395, over 3348686.30 frames. ], batch size: 66, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:52:32,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2024-09-25 11:52:53,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=737898.0, ans=0.0 2024-09-25 11:52:59,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=12.0 2024-09-25 11:53:19,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=737944.6666666666, ans=0.125 2024-09-25 11:53:20,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.78 vs. limit=15.0 2024-09-25 11:53:22,628 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.265e+02 1.339e+02 1.413e+02 1.733e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-25 11:53:29,038 INFO [train.py:1198] (3/4) Epoch 41, batch 2300, loss[loss=0.1577, ctc_loss=0.09894, cr_loss=0.2936, over 17258.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1231, cr_loss=0.3396, over 3355811.80 frames. ], batch size: 42, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:53:46,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.96 vs. limit=15.0 2024-09-25 11:53:50,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=738038.0, ans=0.125 2024-09-25 11:53:59,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=12.0 2024-09-25 11:54:37,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=738178.0, ans=0.0 2024-09-25 11:54:44,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.01 vs. limit=10.0 2024-09-25 11:54:53,984 INFO [train.py:1198] (3/4) Epoch 41, batch 2350, loss[loss=0.2112, ctc_loss=0.1376, cr_loss=0.3676, over 15890.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1233, cr_loss=0.3403, over 3352331.98 frames. ], batch size: 74, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:54:54,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=738224.6666666666, ans=0.0 2024-09-25 11:55:02,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738224.6666666666, ans=0.1 2024-09-25 11:55:11,245 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-09-25 11:55:24,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=738318.0, ans=0.0 2024-09-25 11:55:26,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=738318.0, ans=0.2 2024-09-25 11:55:27,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=738318.0, ans=0.0 2024-09-25 11:55:28,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=738318.0, ans=0.04949747468305833 2024-09-25 11:55:45,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738364.6666666666, ans=0.1 2024-09-25 11:55:50,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=738364.6666666666, ans=0.5 2024-09-25 11:55:53,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=738364.6666666666, ans=0.09899494936611666 2024-09-25 11:55:54,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=738364.6666666666, ans=0.125 2024-09-25 11:56:03,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=738411.3333333334, ans=0.1 2024-09-25 11:56:06,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2024-09-25 11:56:07,456 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.297e+02 1.370e+02 1.455e+02 1.687e+02, threshold=2.740e+02, percent-clipped=0.0 2024-09-25 11:56:13,957 INFO [train.py:1198] (3/4) Epoch 41, batch 2400, loss[loss=0.2176, ctc_loss=0.1413, cr_loss=0.3813, over 16763.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1228, cr_loss=0.3396, over 3355271.89 frames. ], batch size: 61, lr: 2.89e-03, grad_scale: 32.0 2024-09-25 11:56:20,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=738458.0, ans=0.2 2024-09-25 11:56:31,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=738504.6666666666, ans=0.0 2024-09-25 11:56:33,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=738504.6666666666, ans=0.125 2024-09-25 11:56:54,683 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2024-09-25 11:56:55,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=738551.3333333334, ans=0.125 2024-09-25 11:57:18,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=738644.6666666666, ans=0.125 2024-09-25 11:57:19,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=738644.6666666666, ans=0.025 2024-09-25 11:57:27,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-25 11:57:30,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=738644.6666666666, ans=0.2 2024-09-25 11:57:33,334 INFO [train.py:1198] (3/4) Epoch 41, batch 2450, loss[loss=0.2005, ctc_loss=0.1292, cr_loss=0.3565, over 17149.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1232, cr_loss=0.3398, over 3345154.74 frames. ], batch size: 48, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:57:41,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=738691.3333333334, ans=0.0 2024-09-25 11:57:41,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738691.3333333334, ans=0.1 2024-09-25 11:57:53,374 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2024-09-25 11:58:03,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=738738.0, ans=0.125 2024-09-25 11:58:34,123 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.29 vs. limit=15.0 2024-09-25 11:58:35,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=738831.3333333334, ans=0.0 2024-09-25 11:58:35,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-09-25 11:58:38,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=738831.3333333334, ans=0.125 2024-09-25 11:58:51,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738878.0, ans=0.1 2024-09-25 11:58:51,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=738878.0, ans=0.125 2024-09-25 11:58:55,646 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.283e+02 1.406e+02 1.500e+02 1.911e+02, threshold=2.813e+02, percent-clipped=0.0 2024-09-25 11:59:00,467 INFO [train.py:1198] (3/4) Epoch 41, batch 2500, loss[loss=0.1661, ctc_loss=0.1073, cr_loss=0.2936, over 17299.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1235, cr_loss=0.3406, over 3347734.78 frames. ], batch size: 49, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:59:00,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=738924.6666666666, ans=0.1 2024-09-25 11:59:05,755 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2024-09-25 11:59:15,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=738971.3333333334, ans=0.125 2024-09-25 11:59:21,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=738971.3333333334, ans=0.0 2024-09-25 12:00:10,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=739111.3333333334, ans=0.0 2024-09-25 12:00:23,050 INFO [train.py:1198] (3/4) Epoch 41, batch 2550, loss[loss=0.1595, ctc_loss=0.09907, cr_loss=0.302, over 16648.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.123, cr_loss=0.3406, over 3354178.23 frames. ], batch size: 37, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 12:00:39,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=739204.6666666666, ans=0.125 2024-09-25 12:00:53,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=739251.3333333334, ans=0.125 2024-09-25 12:01:01,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=739251.3333333334, ans=0.0 2024-09-25 12:01:20,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=739298.0, ans=0.125 2024-09-25 12:01:38,423 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.313e+02 1.392e+02 1.468e+02 1.882e+02, threshold=2.785e+02, percent-clipped=0.0 2024-09-25 12:01:43,158 INFO [train.py:1198] (3/4) Epoch 41, batch 2600, loss[loss=0.1461, ctc_loss=0.08959, cr_loss=0.2823, over 17040.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1234, cr_loss=0.3416, over 3357947.04 frames. ], batch size: 39, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 12:01:43,739 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=22.5 2024-09-25 12:01:46,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=739391.3333333334, ans=0.0 2024-09-25 12:01:49,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=739391.3333333334, ans=0.0 2024-09-25 12:02:11,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=739438.0, ans=0.125 2024-09-25 12:02:17,323 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2024-09-25 12:02:18,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2024-09-25 12:02:23,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=739484.6666666666, ans=0.0 2024-09-25 12:02:24,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=739484.6666666666, ans=0.125 2024-09-25 12:02:42,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-09-25 12:02:53,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=739578.0, ans=0.0 2024-09-25 12:03:07,752 INFO [train.py:1198] (3/4) Epoch 41, batch 2650, loss[loss=0.1586, ctc_loss=0.09926, cr_loss=0.2966, over 17087.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1228, cr_loss=0.3397, over 3358343.38 frames. ], batch size: 40, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 12:03:07,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=739624.6666666666, ans=0.05 2024-09-25 12:03:20,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.34 vs. limit=15.0 2024-09-25 12:03:52,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=739718.0, ans=0.025 2024-09-25 12:03:54,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=739718.0, ans=0.05 2024-09-25 12:04:00,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=739764.6666666666, ans=0.125 2024-09-25 12:04:20,128 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:04:26,078 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.313e+02 1.405e+02 1.499e+02 1.845e+02, threshold=2.809e+02, percent-clipped=0.0 2024-09-25 12:04:30,839 INFO [train.py:1198] (3/4) Epoch 41, batch 2700, loss[loss=0.1795, ctc_loss=0.1133, cr_loss=0.3306, over 17074.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1229, cr_loss=0.3397, over 3354868.61 frames. ], batch size: 46, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 12:04:35,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=739858.0, ans=0.125 2024-09-25 12:04:53,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=739904.6666666666, ans=0.2 2024-09-25 12:04:59,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=739904.6666666666, ans=0.025 2024-09-25 12:05:02,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739904.6666666666, ans=0.1 2024-09-25 12:05:16,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=739951.3333333334, ans=0.2 2024-09-25 12:05:24,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=739998.0, ans=0.2 2024-09-25 12:05:53,478 INFO [train.py:1198] (3/4) Epoch 41, batch 2750, loss[loss=0.203, ctc_loss=0.1336, cr_loss=0.3468, over 15751.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1232, cr_loss=0.3395, over 3348230.26 frames. ], batch size: 74, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 12:06:03,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=740091.3333333334, ans=0.125 2024-09-25 12:06:18,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=740138.0, ans=10.0 2024-09-25 12:06:22,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=740138.0, ans=0.0 2024-09-25 12:06:34,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=740184.6666666666, ans=0.0 2024-09-25 12:06:50,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=740231.3333333334, ans=0.125 2024-09-25 12:07:09,101 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.274e+02 1.386e+02 1.486e+02 2.179e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 12:07:11,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=740278.0, ans=0.125 2024-09-25 12:07:12,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=740324.6666666666, ans=0.125 2024-09-25 12:07:14,041 INFO [train.py:1198] (3/4) Epoch 41, batch 2800, loss[loss=0.2023, ctc_loss=0.1265, cr_loss=0.3788, over 17346.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.123, cr_loss=0.3401, over 3359201.13 frames. ], batch size: 48, lr: 2.89e-03, grad_scale: 32.0 2024-09-25 12:07:17,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740324.6666666666, ans=0.1 2024-09-25 12:07:29,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=740371.3333333334, ans=0.0 2024-09-25 12:07:44,433 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.15 vs. limit=10.0 2024-09-25 12:08:42,544 INFO [train.py:1198] (3/4) Epoch 41, batch 2850, loss[loss=0.2021, ctc_loss=0.1308, cr_loss=0.3566, over 17308.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1229, cr_loss=0.3401, over 3355172.32 frames. ], batch size: 49, lr: 2.89e-03, grad_scale: 32.0 2024-09-25 12:08:49,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=740558.0, ans=0.125 2024-09-25 12:08:49,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=740558.0, ans=0.0 2024-09-25 12:08:51,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=740558.0, ans=0.025 2024-09-25 12:09:38,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=740698.0, ans=0.125 2024-09-25 12:10:00,303 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.293e+02 1.358e+02 1.450e+02 1.925e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 12:10:05,284 INFO [train.py:1198] (3/4) Epoch 41, batch 2900, loss[loss=0.1897, ctc_loss=0.1264, cr_loss=0.3164, over 17294.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1222, cr_loss=0.3386, over 3354207.35 frames. ], batch size: 51, lr: 2.89e-03, grad_scale: 32.0 2024-09-25 12:10:54,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=12.0 2024-09-25 12:11:08,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=740978.0, ans=0.0 2024-09-25 12:11:21,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2024-09-25 12:11:25,601 INFO [train.py:1198] (3/4) Epoch 41, batch 2950, loss[loss=0.1963, ctc_loss=0.1252, cr_loss=0.3551, over 16677.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1215, cr_loss=0.3369, over 3359713.57 frames. ], batch size: 61, lr: 2.89e-03, grad_scale: 32.0 2024-09-25 12:11:56,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2024-09-25 12:12:07,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=741118.0, ans=0.07 2024-09-25 12:12:10,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=741118.0, ans=0.125 2024-09-25 12:12:13,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=741164.6666666666, ans=0.1 2024-09-25 12:12:21,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=741164.6666666666, ans=0.0 2024-09-25 12:12:23,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=741164.6666666666, ans=0.125 2024-09-25 12:12:24,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=741164.6666666666, ans=0.0 2024-09-25 12:12:26,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=741164.6666666666, ans=0.0 2024-09-25 12:12:36,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741211.3333333334, ans=0.1 2024-09-25 12:12:40,612 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.313e+02 1.388e+02 1.485e+02 2.724e+02, threshold=2.776e+02, percent-clipped=1.0 2024-09-25 12:12:45,293 INFO [train.py:1198] (3/4) Epoch 41, batch 3000, loss[loss=0.2333, ctc_loss=0.1564, cr_loss=0.3847, over 15226.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1218, cr_loss=0.3375, over 3358648.43 frames. ], batch size: 89, lr: 2.89e-03, grad_scale: 32.0 2024-09-25 12:12:45,294 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 12:13:00,792 INFO [train.py:1230] (3/4) Epoch 41, validation: loss=0.03575, ctc_loss=0.03575, cr_loss=9.81e-15, over 944034.00 frames. 2024-09-25 12:13:00,793 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 12:13:12,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=741258.0, ans=0.125 2024-09-25 12:13:19,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=741304.6666666666, ans=0.1 2024-09-25 12:13:39,807 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.73 vs. limit=6.0 2024-09-25 12:13:54,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=741398.0, ans=0.0 2024-09-25 12:14:26,513 INFO [train.py:1198] (3/4) Epoch 41, batch 3050, loss[loss=0.1523, ctc_loss=0.0972, cr_loss=0.2753, over 17077.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1218, cr_loss=0.3372, over 3353037.44 frames. ], batch size: 43, lr: 2.88e-03, grad_scale: 32.0 2024-09-25 12:14:31,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=741491.3333333334, ans=0.125 2024-09-25 12:14:39,363 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:14:42,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=741538.0, ans=0.0 2024-09-25 12:15:04,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=741584.6666666666, ans=0.0 2024-09-25 12:15:14,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=741631.3333333334, ans=0.0 2024-09-25 12:15:25,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=741631.3333333334, ans=0.0 2024-09-25 12:15:39,715 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.278e+02 1.352e+02 1.472e+02 2.246e+02, threshold=2.705e+02, percent-clipped=0.0 2024-09-25 12:15:44,429 INFO [train.py:1198] (3/4) Epoch 41, batch 3100, loss[loss=0.2124, ctc_loss=0.1421, cr_loss=0.3517, over 15202.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1211, cr_loss=0.3358, over 3356337.91 frames. ], batch size: 89, lr: 2.88e-03, grad_scale: 32.0 2024-09-25 12:15:56,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-09-25 12:15:59,160 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-25 12:16:34,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=741864.6666666666, ans=0.125 2024-09-25 12:17:03,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=741958.0, ans=0.125 2024-09-25 12:17:04,756 INFO [train.py:1198] (3/4) Epoch 41, batch 3150, loss[loss=0.2148, ctc_loss=0.1393, cr_loss=0.3771, over 17031.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1217, cr_loss=0.3381, over 3365680.66 frames. ], batch size: 52, lr: 2.88e-03, grad_scale: 32.0 2024-09-25 12:17:14,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=741958.0, ans=0.125 2024-09-25 12:17:27,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742004.6666666666, ans=0.1 2024-09-25 12:17:55,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=742098.0, ans=0.0 2024-09-25 12:18:02,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=742098.0, ans=0.125 2024-09-25 12:18:03,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=742098.0, ans=0.2 2024-09-25 12:18:11,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=742144.6666666666, ans=0.125 2024-09-25 12:18:19,186 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.276e+02 1.346e+02 1.473e+02 3.100e+02, threshold=2.691e+02, percent-clipped=1.0 2024-09-25 12:18:23,857 INFO [train.py:1198] (3/4) Epoch 41, batch 3200, loss[loss=0.1979, ctc_loss=0.1275, cr_loss=0.3516, over 17108.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1211, cr_loss=0.3372, over 3367766.66 frames. ], batch size: 49, lr: 2.88e-03, grad_scale: 32.0 2024-09-25 12:18:36,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=742191.3333333334, ans=0.125 2024-09-25 12:19:30,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=742378.0, ans=0.125 2024-09-25 12:19:32,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.09 vs. limit=6.0 2024-09-25 12:19:33,125 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:19:42,199 INFO [train.py:1198] (3/4) Epoch 41, batch 3250, loss[loss=0.1811, ctc_loss=0.1168, cr_loss=0.3214, over 17222.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1209, cr_loss=0.3369, over 3371634.14 frames. ], batch size: 50, lr: 2.88e-03, grad_scale: 8.0 2024-09-25 12:19:51,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=742424.6666666666, ans=0.0 2024-09-25 12:19:58,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=742471.3333333334, ans=0.0 2024-09-25 12:20:04,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=15.0 2024-09-25 12:20:08,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=742471.3333333334, ans=0.125 2024-09-25 12:20:10,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=742471.3333333334, ans=0.125 2024-09-25 12:20:16,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=742518.0, ans=0.0 2024-09-25 12:20:40,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=742564.6666666666, ans=0.125 2024-09-25 12:20:57,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=742611.3333333334, ans=0.0 2024-09-25 12:20:58,869 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.297e+02 1.390e+02 1.461e+02 1.762e+02, threshold=2.780e+02, percent-clipped=0.0 2024-09-25 12:21:00,474 INFO [train.py:1198] (3/4) Epoch 41, batch 3300, loss[loss=0.1649, ctc_loss=0.104, cr_loss=0.3042, over 17257.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1211, cr_loss=0.3369, over 3361064.02 frames. ], batch size: 42, lr: 2.88e-03, grad_scale: 8.0 2024-09-25 12:21:02,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=742658.0, ans=0.025 2024-09-25 12:21:05,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=742658.0, ans=0.125 2024-09-25 12:21:08,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=742658.0, ans=0.1 2024-09-25 12:21:16,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=742704.6666666666, ans=0.0 2024-09-25 12:21:29,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=742704.6666666666, ans=0.2 2024-09-25 12:21:38,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=742751.3333333334, ans=0.2 2024-09-25 12:21:49,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2024-09-25 12:21:54,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=742798.0, ans=0.1 2024-09-25 12:22:01,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742844.6666666666, ans=0.1 2024-09-25 12:22:09,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=22.5 2024-09-25 12:22:15,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742844.6666666666, ans=0.1 2024-09-25 12:22:18,679 INFO [train.py:1198] (3/4) Epoch 41, batch 3350, loss[loss=0.1959, ctc_loss=0.1245, cr_loss=0.3572, over 17254.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1213, cr_loss=0.3376, over 3368254.34 frames. ], batch size: 44, lr: 2.88e-03, grad_scale: 8.0 2024-09-25 12:22:23,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=742891.3333333334, ans=0.125 2024-09-25 12:22:31,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=742891.3333333334, ans=0.125 2024-09-25 12:23:03,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2024-09-25 12:23:05,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-09-25 12:23:24,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=743078.0, ans=0.0 2024-09-25 12:23:26,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2024-09-25 12:23:35,162 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.288e+02 1.390e+02 1.522e+02 3.340e+02, threshold=2.781e+02, percent-clipped=2.0 2024-09-25 12:23:36,768 INFO [train.py:1198] (3/4) Epoch 41, batch 3400, loss[loss=0.1897, ctc_loss=0.1215, cr_loss=0.3408, over 17105.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1219, cr_loss=0.3392, over 3366110.31 frames. ], batch size: 49, lr: 2.88e-03, grad_scale: 8.0 2024-09-25 12:23:46,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=743124.6666666666, ans=0.125 2024-09-25 12:23:52,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=743171.3333333334, ans=0.2 2024-09-25 12:24:10,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=743218.0, ans=0.0 2024-09-25 12:24:15,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=743218.0, ans=0.1 2024-09-25 12:24:31,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=743264.6666666666, ans=0.2 2024-09-25 12:24:42,533 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:24:47,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=743311.3333333334, ans=0.035 2024-09-25 12:25:00,983 INFO [train.py:1198] (3/4) Epoch 41, batch 3450, loss[loss=0.2107, ctc_loss=0.1376, cr_loss=0.3657, over 15170.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1225, cr_loss=0.3401, over 3361073.62 frames. ], batch size: 89, lr: 2.88e-03, grad_scale: 8.0 2024-09-25 12:25:13,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743358.0, ans=0.1 2024-09-25 12:25:24,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=743404.6666666666, ans=0.0 2024-09-25 12:26:10,288 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2024-09-25 12:26:17,204 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.301e+02 1.363e+02 1.481e+02 3.003e+02, threshold=2.726e+02, percent-clipped=1.0 2024-09-25 12:26:18,734 INFO [train.py:1198] (3/4) Epoch 41, batch 3500, loss[loss=0.1859, ctc_loss=0.1175, cr_loss=0.3424, over 17128.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1224, cr_loss=0.3391, over 3349876.73 frames. ], batch size: 48, lr: 2.88e-03, grad_scale: 8.0 2024-09-25 12:26:28,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.41 vs. limit=22.5 2024-09-25 12:26:29,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=743591.3333333334, ans=0.125 2024-09-25 12:26:51,562 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2024-09-25 12:27:09,843 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:27:23,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743778.0, ans=0.1 2024-09-25 12:27:31,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=743778.0, ans=0.125 2024-09-25 12:27:39,046 INFO [train.py:1198] (3/4) Epoch 41, batch 3550, loss[loss=0.1665, ctc_loss=0.1045, cr_loss=0.3099, over 17180.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1219, cr_loss=0.3381, over 3349661.45 frames. ], batch size: 41, lr: 2.88e-03, grad_scale: 8.0 2024-09-25 12:27:40,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=743824.6666666666, ans=0.0 2024-09-25 12:27:41,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.74 vs. limit=22.5 2024-09-25 12:27:42,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=743824.6666666666, ans=0.125 2024-09-25 12:28:13,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=743918.0, ans=0.07 2024-09-25 12:28:15,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=743918.0, ans=0.125 2024-09-25 12:28:55,696 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.288e+02 1.370e+02 1.460e+02 2.719e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-25 12:28:57,318 INFO [train.py:1198] (3/4) Epoch 41, batch 3600, loss[loss=0.2058, ctc_loss=0.1338, cr_loss=0.3596, over 16632.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1219, cr_loss=0.3376, over 3338503.20 frames. ], batch size: 61, lr: 2.88e-03, grad_scale: 16.0 2024-09-25 12:29:19,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=744104.6666666666, ans=0.125 2024-09-25 12:29:33,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=744151.3333333334, ans=0.125 2024-09-25 12:29:47,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744198.0, ans=0.1 2024-09-25 12:30:04,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=744244.6666666666, ans=0.0 2024-09-25 12:30:14,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=744291.3333333334, ans=0.125 2024-09-25 12:30:15,425 INFO [train.py:1198] (3/4) Epoch 41, batch 3650, loss[loss=0.2343, ctc_loss=0.1537, cr_loss=0.403, over 17017.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1221, cr_loss=0.3378, over 3340732.11 frames. ], batch size: 52, lr: 2.88e-03, grad_scale: 16.0 2024-09-25 12:30:23,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=744291.3333333334, ans=0.0 2024-09-25 12:30:56,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=744384.6666666666, ans=0.125 2024-09-25 12:31:32,489 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.315e+02 1.407e+02 1.494e+02 1.743e+02, threshold=2.814e+02, percent-clipped=0.0 2024-09-25 12:31:32,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=744524.6666666666, ans=0.2 2024-09-25 12:31:34,122 INFO [train.py:1198] (3/4) Epoch 41, batch 3700, loss[loss=0.2178, ctc_loss=0.1414, cr_loss=0.3823, over 17040.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1226, cr_loss=0.3384, over 3341353.52 frames. ], batch size: 51, lr: 2.88e-03, grad_scale: 16.0 2024-09-25 12:31:46,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=744524.6666666666, ans=0.2 2024-09-25 12:31:46,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=744524.6666666666, ans=0.1 2024-09-25 12:32:05,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=744618.0, ans=0.025 2024-09-25 12:32:27,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=744664.6666666666, ans=0.07 2024-09-25 12:32:47,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=744711.3333333334, ans=12.0 2024-09-25 12:32:53,079 INFO [train.py:1198] (3/4) Epoch 41, batch 3750, loss[loss=0.203, ctc_loss=0.1299, cr_loss=0.3658, over 17030.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.1232, cr_loss=0.3405, over 3340013.86 frames. ], batch size: 52, lr: 2.88e-03, grad_scale: 16.0 2024-09-25 12:33:05,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=744758.0, ans=0.125 2024-09-25 12:33:20,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=744804.6666666666, ans=0.125 2024-09-25 12:33:31,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=15.0 2024-09-25 12:33:50,046 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.29 vs. limit=15.0 2024-09-25 12:33:52,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744898.0, ans=0.1 2024-09-25 12:34:11,725 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.288e+02 1.369e+02 1.453e+02 1.957e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-25 12:34:11,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=744991.3333333334, ans=0.125 2024-09-25 12:34:14,055 INFO [train.py:1198] (3/4) Epoch 41, batch 3800, loss[loss=0.1724, ctc_loss=0.1098, cr_loss=0.313, over 17261.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1237, cr_loss=0.3414, over 3336622.37 frames. ], batch size: 44, lr: 2.88e-03, grad_scale: 16.0 2024-09-25 12:34:16,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=744991.3333333334, ans=0.0 2024-09-25 12:34:19,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2024-09-25 12:34:22,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=744991.3333333334, ans=0.0 2024-09-25 12:34:25,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.40 vs. limit=10.0 2024-09-25 12:34:28,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=745038.0, ans=0.0 2024-09-25 12:34:34,563 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:34:34,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=745038.0, ans=0.125 2024-09-25 12:34:36,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=745038.0, ans=0.025 2024-09-25 12:34:36,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=745038.0, ans=0.125 2024-09-25 12:34:36,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=745038.0, ans=0.0 2024-09-25 12:34:39,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=745038.0, ans=0.2 2024-09-25 12:34:46,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=745084.6666666666, ans=0.2 2024-09-25 12:35:01,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=745131.3333333334, ans=0.125 2024-09-25 12:35:01,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=745131.3333333334, ans=0.125 2024-09-25 12:35:12,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=745131.3333333334, ans=0.2 2024-09-25 12:35:18,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=745178.0, ans=0.025 2024-09-25 12:35:32,599 INFO [train.py:1198] (3/4) Epoch 41, batch 3850, loss[loss=0.1568, ctc_loss=0.09927, cr_loss=0.2874, over 16270.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1248, cr_loss=0.3425, over 3313172.44 frames. ], batch size: 36, lr: 2.88e-03, grad_scale: 16.0 2024-09-25 12:36:03,081 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.64 vs. limit=15.0 2024-09-25 12:36:35,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2024-09-25 12:37:33,474 INFO [train.py:1198] (3/4) Epoch 42, batch 0, loss[loss=0.177, ctc_loss=0.1117, cr_loss=0.3265, over 16930.00 frames. ], tot_loss[loss=0.177, ctc_loss=0.1117, cr_loss=0.3265, over 16930.00 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:37:33,474 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 12:37:48,883 INFO [train.py:1230] (3/4) Epoch 42, validation: loss=0.03453, ctc_loss=0.03453, cr_loss=1.019e-14, over 944034.00 frames. 2024-09-25 12:37:48,884 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 12:37:50,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=745439.3333333334, ans=0.125 2024-09-25 12:37:53,635 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.354e+02 1.491e+02 1.700e+02 3.066e+02, threshold=2.981e+02, percent-clipped=1.0 2024-09-25 12:37:55,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=745439.3333333334, ans=0.0 2024-09-25 12:38:05,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=745486.0, ans=0.07 2024-09-25 12:38:06,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=745486.0, ans=0.025 2024-09-25 12:38:06,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=745486.0, ans=0.125 2024-09-25 12:38:14,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=745486.0, ans=0.0 2024-09-25 12:38:39,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=12.0 2024-09-25 12:39:09,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=745672.6666666666, ans=0.2 2024-09-25 12:39:11,034 INFO [train.py:1198] (3/4) Epoch 42, batch 50, loss[loss=0.1827, ctc_loss=0.118, cr_loss=0.3235, over 17217.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1234, cr_loss=0.3392, over 746451.79 frames. ], batch size: 47, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:39:13,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-25 12:40:31,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-09-25 12:40:37,093 INFO [train.py:1198] (3/4) Epoch 42, batch 100, loss[loss=0.1815, ctc_loss=0.1126, cr_loss=0.3441, over 17031.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.122, cr_loss=0.3374, over 1325912.15 frames. ], batch size: 39, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:40:40,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.57 vs. limit=15.0 2024-09-25 12:40:41,769 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.296e+02 1.378e+02 1.504e+02 1.895e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 12:40:58,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2024-09-25 12:40:59,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=745952.6666666666, ans=0.125 2024-09-25 12:41:15,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=745999.3333333334, ans=0.0 2024-09-25 12:41:16,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=745999.3333333334, ans=0.125 2024-09-25 12:41:29,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=746046.0, ans=10.0 2024-09-25 12:41:34,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=746046.0, ans=0.125 2024-09-25 12:41:36,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=746046.0, ans=0.1 2024-09-25 12:41:46,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2024-09-25 12:41:46,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=4.99 vs. limit=15.0 2024-09-25 12:41:59,756 INFO [train.py:1198] (3/4) Epoch 42, batch 150, loss[loss=0.1576, ctc_loss=0.09903, cr_loss=0.293, over 17285.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1221, cr_loss=0.3384, over 1770514.54 frames. ], batch size: 42, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:43:14,221 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=22.5 2024-09-25 12:43:15,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=746326.0, ans=0.1 2024-09-25 12:43:19,866 INFO [train.py:1198] (3/4) Epoch 42, batch 200, loss[loss=0.2088, ctc_loss=0.1341, cr_loss=0.3735, over 17370.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1228, cr_loss=0.3404, over 2130266.27 frames. ], batch size: 48, lr: 2.84e-03, grad_scale: 16.0 2024-09-25 12:43:20,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=746372.6666666666, ans=0.035 2024-09-25 12:43:25,466 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2024-09-25 12:43:26,427 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.308e+02 1.400e+02 1.491e+02 1.853e+02, threshold=2.800e+02, percent-clipped=0.0 2024-09-25 12:43:29,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=746372.6666666666, ans=0.025 2024-09-25 12:43:55,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=746466.0, ans=0.0 2024-09-25 12:43:55,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=746466.0, ans=0.0 2024-09-25 12:44:02,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=746466.0, ans=0.0 2024-09-25 12:44:20,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=746512.6666666666, ans=0.0 2024-09-25 12:44:41,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=746559.3333333334, ans=0.125 2024-09-25 12:44:45,845 INFO [train.py:1198] (3/4) Epoch 42, batch 250, loss[loss=0.1685, ctc_loss=0.1052, cr_loss=0.3162, over 17041.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1218, cr_loss=0.3378, over 2398940.01 frames. ], batch size: 39, lr: 2.84e-03, grad_scale: 16.0 2024-09-25 12:44:46,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746606.0, ans=0.1 2024-09-25 12:45:55,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=746792.6666666666, ans=0.035 2024-09-25 12:46:13,672 INFO [train.py:1198] (3/4) Epoch 42, batch 300, loss[loss=0.2155, ctc_loss=0.1376, cr_loss=0.3895, over 17076.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1223, cr_loss=0.3399, over 2618725.18 frames. ], batch size: 49, lr: 2.84e-03, grad_scale: 16.0 2024-09-25 12:46:17,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=746839.3333333334, ans=0.0 2024-09-25 12:46:20,041 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.307e+02 1.390e+02 1.481e+02 2.059e+02, threshold=2.780e+02, percent-clipped=0.0 2024-09-25 12:46:51,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.23 vs. limit=15.0 2024-09-25 12:47:00,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=746979.3333333334, ans=0.125 2024-09-25 12:47:14,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=746979.3333333334, ans=0.5 2024-09-25 12:47:33,921 INFO [train.py:1198] (3/4) Epoch 42, batch 350, loss[loss=0.1772, ctc_loss=0.1117, cr_loss=0.3271, over 17022.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1221, cr_loss=0.34, over 2787556.17 frames. ], batch size: 51, lr: 2.84e-03, grad_scale: 16.0 2024-09-25 12:47:56,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=747119.3333333334, ans=0.05 2024-09-25 12:48:06,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=747166.0, ans=0.125 2024-09-25 12:48:19,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=747166.0, ans=0.0 2024-09-25 12:48:22,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747212.6666666666, ans=0.1 2024-09-25 12:48:33,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.73 vs. limit=10.0 2024-09-25 12:48:56,552 INFO [train.py:1198] (3/4) Epoch 42, batch 400, loss[loss=0.2016, ctc_loss=0.1279, cr_loss=0.3684, over 17075.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1225, cr_loss=0.3403, over 2908286.15 frames. ], batch size: 46, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:49:02,843 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.303e+02 1.377e+02 1.468e+02 2.064e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-25 12:49:12,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=747352.6666666666, ans=0.125 2024-09-25 12:49:14,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=747352.6666666666, ans=0.0 2024-09-25 12:49:14,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=22.5 2024-09-25 12:49:25,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=747352.6666666666, ans=0.0 2024-09-25 12:49:48,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747446.0, ans=0.1 2024-09-25 12:49:58,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=747446.0, ans=0.2 2024-09-25 12:50:06,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.72 vs. limit=10.0 2024-09-25 12:50:21,630 INFO [train.py:1198] (3/4) Epoch 42, batch 450, loss[loss=0.2163, ctc_loss=0.1441, cr_loss=0.3607, over 14621.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1226, cr_loss=0.3405, over 3006854.78 frames. ], batch size: 89, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:50:25,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=747539.3333333334, ans=0.0 2024-09-25 12:50:26,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=747539.3333333334, ans=0.0 2024-09-25 12:50:29,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=747539.3333333334, ans=0.0 2024-09-25 12:51:11,904 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.60 vs. limit=10.0 2024-09-25 12:51:36,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.76 vs. limit=5.0 2024-09-25 12:51:44,414 INFO [train.py:1198] (3/4) Epoch 42, batch 500, loss[loss=0.2066, ctc_loss=0.133, cr_loss=0.368, over 17140.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1221, cr_loss=0.3389, over 3079181.25 frames. ], batch size: 48, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:51:44,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=747772.6666666666, ans=0.125 2024-09-25 12:51:50,867 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.301e+02 1.386e+02 1.488e+02 2.359e+02, threshold=2.773e+02, percent-clipped=0.0 2024-09-25 12:51:58,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.48 vs. limit=15.0 2024-09-25 12:52:13,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=747819.3333333334, ans=0.0 2024-09-25 12:52:16,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747866.0, ans=0.1 2024-09-25 12:52:17,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2024-09-25 12:52:24,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=747866.0, ans=0.125 2024-09-25 12:52:34,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=747912.6666666666, ans=0.0 2024-09-25 12:52:35,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=747912.6666666666, ans=0.0 2024-09-25 12:52:48,564 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:53:04,114 INFO [train.py:1198] (3/4) Epoch 42, batch 550, loss[loss=0.2149, ctc_loss=0.1363, cr_loss=0.3931, over 17153.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1216, cr_loss=0.3383, over 3143184.65 frames. ], batch size: 48, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:53:21,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=748052.6666666666, ans=0.1 2024-09-25 12:53:26,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=748052.6666666666, ans=0.125 2024-09-25 12:53:37,980 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:53:56,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=748146.0, ans=0.04949747468305833 2024-09-25 12:54:06,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=748146.0, ans=0.125 2024-09-25 12:54:29,016 INFO [train.py:1198] (3/4) Epoch 42, batch 600, loss[loss=0.2127, ctc_loss=0.1457, cr_loss=0.3354, over 11620.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1217, cr_loss=0.3379, over 3187739.23 frames. ], batch size: 123, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:54:35,340 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.189e+02 1.284e+02 1.358e+02 1.453e+02 2.155e+02, threshold=2.716e+02, percent-clipped=0.0 2024-09-25 12:54:48,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=748286.0, ans=0.125 2024-09-25 12:54:55,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=748286.0, ans=0.1 2024-09-25 12:54:56,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=748286.0, ans=0.2 2024-09-25 12:55:04,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=748332.6666666666, ans=0.125 2024-09-25 12:55:09,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=748332.6666666666, ans=0.125 2024-09-25 12:55:51,815 INFO [train.py:1198] (3/4) Epoch 42, batch 650, loss[loss=0.1911, ctc_loss=0.1213, cr_loss=0.3487, over 17170.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1219, cr_loss=0.3381, over 3231769.22 frames. ], batch size: 45, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:56:14,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=748519.3333333334, ans=0.125 2024-09-25 12:56:25,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=748566.0, ans=0.125 2024-09-25 12:56:27,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=748566.0, ans=0.1 2024-09-25 12:56:41,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=748612.6666666666, ans=0.0 2024-09-25 12:56:52,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=748612.6666666666, ans=0.0 2024-09-25 12:57:07,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=748659.3333333334, ans=0.125 2024-09-25 12:57:14,805 INFO [train.py:1198] (3/4) Epoch 42, batch 700, loss[loss=0.1907, ctc_loss=0.1224, cr_loss=0.3416, over 16844.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1224, cr_loss=0.3394, over 3266069.39 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:57:18,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=748706.0, ans=0.0 2024-09-25 12:57:18,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748706.0, ans=0.1 2024-09-25 12:57:21,243 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.310e+02 1.390e+02 1.481e+02 1.937e+02, threshold=2.780e+02, percent-clipped=0.0 2024-09-25 12:57:31,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=748752.6666666666, ans=0.125 2024-09-25 12:57:37,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=748752.6666666666, ans=0.1 2024-09-25 12:57:39,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=748752.6666666666, ans=0.125 2024-09-25 12:57:45,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=748799.3333333334, ans=0.125 2024-09-25 12:57:48,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=748799.3333333334, ans=0.0 2024-09-25 12:58:06,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=748846.0, ans=0.125 2024-09-25 12:58:24,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=748892.6666666666, ans=0.2 2024-09-25 12:58:30,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=748892.6666666666, ans=0.95 2024-09-25 12:58:34,919 INFO [train.py:1198] (3/4) Epoch 42, batch 750, loss[loss=0.1616, ctc_loss=0.09994, cr_loss=0.3081, over 17112.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1222, cr_loss=0.3393, over 3293812.03 frames. ], batch size: 40, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:58:37,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=12.0 2024-09-25 12:59:06,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=748986.0, ans=0.0 2024-09-25 12:59:27,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=749079.3333333334, ans=0.125 2024-09-25 12:59:27,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=749079.3333333334, ans=0.125 2024-09-25 12:59:41,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=749079.3333333334, ans=0.125 2024-09-25 12:59:48,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=749126.0, ans=0.025 2024-09-25 12:59:51,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=749126.0, ans=0.125 2024-09-25 12:59:53,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2024-09-25 13:00:00,493 INFO [train.py:1198] (3/4) Epoch 42, batch 800, loss[loss=0.1886, ctc_loss=0.1229, cr_loss=0.3283, over 17211.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1217, cr_loss=0.3377, over 3313144.09 frames. ], batch size: 47, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 13:00:06,835 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.303e+02 1.405e+02 1.532e+02 1.999e+02, threshold=2.810e+02, percent-clipped=0.0 2024-09-25 13:00:19,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=749219.3333333334, ans=0.125 2024-09-25 13:00:24,333 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 13:00:27,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=749219.3333333334, ans=0.2 2024-09-25 13:00:43,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=749266.0, ans=0.125 2024-09-25 13:00:43,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=749266.0, ans=0.125 2024-09-25 13:01:23,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=749359.3333333334, ans=0.1 2024-09-25 13:01:25,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=749406.0, ans=0.5 2024-09-25 13:01:26,618 INFO [train.py:1198] (3/4) Epoch 42, batch 850, loss[loss=0.2068, ctc_loss=0.1323, cr_loss=0.3727, over 17225.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1216, cr_loss=0.3373, over 3324623.60 frames. ], batch size: 47, lr: 2.83e-03, grad_scale: 32.0 2024-09-25 13:01:26,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=749406.0, ans=0.025 2024-09-25 13:01:38,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=749406.0, ans=0.125 2024-09-25 13:02:08,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=749499.3333333334, ans=0.125 2024-09-25 13:02:24,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=22.5 2024-09-25 13:02:29,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=749546.0, ans=15.0 2024-09-25 13:02:31,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=749592.6666666666, ans=0.125 2024-09-25 13:02:33,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=749592.6666666666, ans=0.1 2024-09-25 13:02:39,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=749592.6666666666, ans=0.1 2024-09-25 13:02:46,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=749639.3333333334, ans=0.05 2024-09-25 13:02:47,637 INFO [train.py:1198] (3/4) Epoch 42, batch 900, loss[loss=0.2432, ctc_loss=0.1629, cr_loss=0.4013, over 15121.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1214, cr_loss=0.3372, over 3335097.35 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 32.0 2024-09-25 13:02:53,991 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.278e+02 1.357e+02 1.447e+02 3.889e+02, threshold=2.715e+02, percent-clipped=1.0 2024-09-25 13:02:56,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=749639.3333333334, ans=0.1 2024-09-25 13:02:58,047 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=22.5 2024-09-25 13:02:59,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=749639.3333333334, ans=0.125 2024-09-25 13:03:20,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.44 vs. limit=10.0 2024-09-25 13:03:59,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.01 vs. limit=6.0 2024-09-25 13:04:11,594 INFO [train.py:1198] (3/4) Epoch 42, batch 950, loss[loss=0.2197, ctc_loss=0.1416, cr_loss=0.3904, over 16745.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1209, cr_loss=0.3362, over 3353944.53 frames. ], batch size: 61, lr: 2.83e-03, grad_scale: 32.0 2024-09-25 13:05:01,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=750012.6666666666, ans=0.2 2024-09-25 13:05:34,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=750059.3333333334, ans=0.125 2024-09-25 13:05:37,407 INFO [train.py:1198] (3/4) Epoch 42, batch 1000, loss[loss=0.2264, ctc_loss=0.1482, cr_loss=0.3909, over 17292.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1205, cr_loss=0.3357, over 3362374.48 frames. ], batch size: 51, lr: 2.83e-03, grad_scale: 32.0 2024-09-25 13:05:39,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=750106.0, ans=0.025 2024-09-25 13:05:43,609 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.283e+02 1.358e+02 1.462e+02 1.840e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-25 13:05:50,772 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.28 vs. limit=15.0 2024-09-25 13:06:05,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=750152.6666666666, ans=0.0 2024-09-25 13:06:05,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=750152.6666666666, ans=0.1 2024-09-25 13:06:06,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=750152.6666666666, ans=0.125 2024-09-25 13:06:20,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.48 vs. limit=10.0 2024-09-25 13:06:29,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.80 vs. limit=10.0 2024-09-25 13:06:32,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=750246.0, ans=0.125 2024-09-25 13:06:59,678 INFO [train.py:1198] (3/4) Epoch 42, batch 1050, loss[loss=0.1752, ctc_loss=0.1117, cr_loss=0.3174, over 17297.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1214, cr_loss=0.3375, over 3368307.13 frames. ], batch size: 46, lr: 2.83e-03, grad_scale: 32.0 2024-09-25 13:07:15,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750386.0, ans=0.1 2024-09-25 13:07:19,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=22.5 2024-09-25 13:07:35,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=750432.6666666666, ans=0.125 2024-09-25 13:07:45,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=750432.6666666666, ans=0.0 2024-09-25 13:08:09,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=750526.0, ans=0.025 2024-09-25 13:08:20,233 INFO [train.py:1198] (3/4) Epoch 42, batch 1100, loss[loss=0.1402, ctc_loss=0.08757, cr_loss=0.2632, over 17113.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1212, cr_loss=0.3361, over 3359485.34 frames. ], batch size: 40, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:08:23,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=750572.6666666666, ans=0.0 2024-09-25 13:08:28,292 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.338e+02 1.422e+02 1.527e+02 1.799e+02, threshold=2.844e+02, percent-clipped=0.0 2024-09-25 13:08:31,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=750572.6666666666, ans=0.025 2024-09-25 13:08:32,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2024-09-25 13:08:49,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2024-09-25 13:08:51,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=750619.3333333334, ans=0.0 2024-09-25 13:09:24,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=750712.6666666666, ans=0.125 2024-09-25 13:09:25,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=750712.6666666666, ans=0.125 2024-09-25 13:09:44,705 INFO [train.py:1198] (3/4) Epoch 42, batch 1150, loss[loss=0.1657, ctc_loss=0.1027, cr_loss=0.3149, over 16964.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1208, cr_loss=0.3357, over 3365119.94 frames. ], batch size: 42, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:09:54,667 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 13:10:01,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=750852.6666666666, ans=0.125 2024-09-25 13:10:01,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750852.6666666666, ans=0.1 2024-09-25 13:10:23,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2024-09-25 13:10:27,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=750899.3333333334, ans=0.2 2024-09-25 13:10:31,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=750899.3333333334, ans=0.125 2024-09-25 13:10:32,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=750899.3333333334, ans=15.0 2024-09-25 13:10:36,165 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.93 vs. limit=15.0 2024-09-25 13:10:47,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.77 vs. limit=10.0 2024-09-25 13:11:05,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=750992.6666666666, ans=0.025 2024-09-25 13:11:09,661 INFO [train.py:1198] (3/4) Epoch 42, batch 1200, loss[loss=0.1866, ctc_loss=0.1171, cr_loss=0.3472, over 17090.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1212, cr_loss=0.3368, over 3357408.66 frames. ], batch size: 43, lr: 2.83e-03, grad_scale: 32.0 2024-09-25 13:11:16,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=751039.3333333334, ans=0.0 2024-09-25 13:11:17,515 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.296e+02 1.388e+02 1.485e+02 1.813e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-25 13:11:57,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=751179.3333333334, ans=0.125 2024-09-25 13:12:03,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-25 13:12:26,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=751226.0, ans=0.2 2024-09-25 13:12:29,348 INFO [train.py:1198] (3/4) Epoch 42, batch 1250, loss[loss=0.1671, ctc_loss=0.1089, cr_loss=0.2913, over 17102.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1209, cr_loss=0.3366, over 3357644.70 frames. ], batch size: 40, lr: 2.83e-03, grad_scale: 32.0 2024-09-25 13:12:59,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=751366.0, ans=0.125 2024-09-25 13:13:12,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=751366.0, ans=0.125 2024-09-25 13:13:27,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=751412.6666666666, ans=0.125 2024-09-25 13:13:44,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=22.5 2024-09-25 13:13:49,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=751506.0, ans=0.125 2024-09-25 13:13:51,191 INFO [train.py:1198] (3/4) Epoch 42, batch 1300, loss[loss=0.1839, ctc_loss=0.1166, cr_loss=0.3364, over 17156.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1209, cr_loss=0.337, over 3359398.94 frames. ], batch size: 45, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:14:00,856 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.307e+02 1.370e+02 1.468e+02 2.127e+02, threshold=2.740e+02, percent-clipped=0.0 2024-09-25 13:14:02,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=751506.0, ans=0.125 2024-09-25 13:14:03,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=22.5 2024-09-25 13:14:31,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=751599.3333333334, ans=0.125 2024-09-25 13:14:32,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=751599.3333333334, ans=0.125 2024-09-25 13:14:32,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=751599.3333333334, ans=0.0 2024-09-25 13:14:34,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=751599.3333333334, ans=0.0 2024-09-25 13:14:39,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=751599.3333333334, ans=0.125 2024-09-25 13:15:01,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=751692.6666666666, ans=0.2 2024-09-25 13:15:16,745 INFO [train.py:1198] (3/4) Epoch 42, batch 1350, loss[loss=0.1886, ctc_loss=0.122, cr_loss=0.3329, over 17285.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1215, cr_loss=0.3383, over 3366636.93 frames. ], batch size: 46, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:15:26,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=751739.3333333334, ans=0.5 2024-09-25 13:15:56,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=751832.6666666666, ans=0.125 2024-09-25 13:16:18,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=751879.3333333334, ans=0.02 2024-09-25 13:16:32,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=751926.0, ans=0.1 2024-09-25 13:16:38,721 INFO [train.py:1198] (3/4) Epoch 42, batch 1400, loss[loss=0.21, ctc_loss=0.1324, cr_loss=0.3882, over 17154.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.122, cr_loss=0.3396, over 3363708.26 frames. ], batch size: 48, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:16:48,177 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.272e+02 1.355e+02 1.432e+02 2.085e+02, threshold=2.710e+02, percent-clipped=0.0 2024-09-25 13:17:13,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2024-09-25 13:17:58,209 INFO [train.py:1198] (3/4) Epoch 42, batch 1450, loss[loss=0.2179, ctc_loss=0.1414, cr_loss=0.3826, over 17085.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.122, cr_loss=0.3395, over 3355341.69 frames. ], batch size: 49, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:18:13,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=752252.6666666666, ans=0.05 2024-09-25 13:18:19,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=752252.6666666666, ans=0.1 2024-09-25 13:18:38,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=752299.3333333334, ans=0.125 2024-09-25 13:18:54,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=752346.0, ans=0.0 2024-09-25 13:18:55,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=752346.0, ans=0.0 2024-09-25 13:19:12,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.03 vs. limit=15.0 2024-09-25 13:19:23,693 INFO [train.py:1198] (3/4) Epoch 42, batch 1500, loss[loss=0.1824, ctc_loss=0.1172, cr_loss=0.3262, over 17295.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1222, cr_loss=0.3396, over 3351541.51 frames. ], batch size: 49, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:19:24,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=752439.3333333334, ans=0.125 2024-09-25 13:19:25,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=752439.3333333334, ans=0.125 2024-09-25 13:19:33,190 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.208e+02 1.278e+02 1.340e+02 1.432e+02 2.576e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-25 13:19:35,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=752439.3333333334, ans=0.0 2024-09-25 13:19:42,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=752486.0, ans=0.1 2024-09-25 13:19:51,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=752486.0, ans=0.125 2024-09-25 13:20:00,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=752532.6666666666, ans=0.125 2024-09-25 13:20:20,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=752579.3333333334, ans=0.2 2024-09-25 13:20:48,642 INFO [train.py:1198] (3/4) Epoch 42, batch 1550, loss[loss=0.2323, ctc_loss=0.154, cr_loss=0.3915, over 16398.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1225, cr_loss=0.3405, over 3358274.28 frames. ], batch size: 66, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:21:50,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=752812.6666666666, ans=0.0 2024-09-25 13:22:07,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=752906.0, ans=0.025 2024-09-25 13:22:09,037 INFO [train.py:1198] (3/4) Epoch 42, batch 1600, loss[loss=0.1851, ctc_loss=0.1192, cr_loss=0.3292, over 17060.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1222, cr_loss=0.3394, over 3353332.81 frames. ], batch size: 46, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:22:20,350 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.308e+02 1.394e+02 1.520e+02 2.214e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-25 13:22:20,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=752906.0, ans=0.0 2024-09-25 13:22:30,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=12.0 2024-09-25 13:22:43,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2024-09-25 13:22:52,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=752999.3333333334, ans=0.09899494936611666 2024-09-25 13:23:14,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.62 vs. limit=12.0 2024-09-25 13:23:31,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=22.5 2024-09-25 13:23:31,969 INFO [train.py:1198] (3/4) Epoch 42, batch 1650, loss[loss=0.1775, ctc_loss=0.1126, cr_loss=0.3244, over 17029.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1218, cr_loss=0.3395, over 3363618.59 frames. ], batch size: 44, lr: 2.83e-03, grad_scale: 8.0 2024-09-25 13:23:48,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=22.5 2024-09-25 13:23:55,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.37 vs. limit=10.0 2024-09-25 13:24:07,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=753232.6666666666, ans=0.125 2024-09-25 13:24:16,195 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=15.0 2024-09-25 13:24:18,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.44 vs. limit=12.0 2024-09-25 13:24:20,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-09-25 13:24:55,039 INFO [train.py:1198] (3/4) Epoch 42, batch 1700, loss[loss=0.1789, ctc_loss=0.1139, cr_loss=0.3251, over 17185.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1214, cr_loss=0.3389, over 3372606.99 frames. ], batch size: 45, lr: 2.83e-03, grad_scale: 8.0 2024-09-25 13:25:10,263 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.274e+02 1.365e+02 1.477e+02 2.615e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-25 13:25:26,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=753419.3333333334, ans=0.125 2024-09-25 13:26:19,506 INFO [train.py:1198] (3/4) Epoch 42, batch 1750, loss[loss=0.1564, ctc_loss=0.09737, cr_loss=0.2952, over 17057.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1216, cr_loss=0.3387, over 3357446.82 frames. ], batch size: 39, lr: 2.83e-03, grad_scale: 8.0 2024-09-25 13:26:19,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=753606.0, ans=0.0 2024-09-25 13:26:26,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=753606.0, ans=0.125 2024-09-25 13:26:35,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=753652.6666666666, ans=0.125 2024-09-25 13:26:41,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2024-09-25 13:26:55,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=753699.3333333334, ans=0.125 2024-09-25 13:26:57,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=753699.3333333334, ans=0.0 2024-09-25 13:27:25,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=753792.6666666666, ans=0.04949747468305833 2024-09-25 13:27:40,004 INFO [train.py:1198] (3/4) Epoch 42, batch 1800, loss[loss=0.1716, ctc_loss=0.1096, cr_loss=0.3102, over 17193.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1219, cr_loss=0.3391, over 3354900.54 frames. ], batch size: 41, lr: 2.83e-03, grad_scale: 8.0 2024-09-25 13:27:52,868 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.293e+02 1.332e+02 1.454e+02 1.867e+02, threshold=2.665e+02, percent-clipped=0.0 2024-09-25 13:27:53,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=753839.3333333334, ans=0.0 2024-09-25 13:28:02,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=753886.0, ans=0.1 2024-09-25 13:28:41,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.42 vs. limit=15.0 2024-09-25 13:28:50,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=754026.0, ans=0.125 2024-09-25 13:29:05,784 INFO [train.py:1198] (3/4) Epoch 42, batch 1850, loss[loss=0.2255, ctc_loss=0.1499, cr_loss=0.3779, over 16994.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1215, cr_loss=0.3382, over 3351162.42 frames. ], batch size: 58, lr: 2.83e-03, grad_scale: 8.0 2024-09-25 13:29:12,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=754072.6666666666, ans=0.1 2024-09-25 13:29:23,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=754119.3333333334, ans=0.2 2024-09-25 13:29:32,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=754119.3333333334, ans=0.1 2024-09-25 13:29:44,047 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2024-09-25 13:29:58,506 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.92 vs. limit=15.0 2024-09-25 13:29:59,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=754212.6666666666, ans=0.125 2024-09-25 13:30:09,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=754212.6666666666, ans=0.025 2024-09-25 13:30:31,299 INFO [train.py:1198] (3/4) Epoch 42, batch 1900, loss[loss=0.2062, ctc_loss=0.135, cr_loss=0.3556, over 17287.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1222, cr_loss=0.3396, over 3358538.91 frames. ], batch size: 49, lr: 2.83e-03, grad_scale: 8.0 2024-09-25 13:30:44,084 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.292e+02 1.376e+02 1.481e+02 2.312e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-25 13:30:44,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=754306.0, ans=0.125 2024-09-25 13:31:27,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=754446.0, ans=0.1 2024-09-25 13:31:44,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=754492.6666666666, ans=0.0 2024-09-25 13:31:50,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2024-09-25 13:31:50,780 INFO [train.py:1198] (3/4) Epoch 42, batch 1950, loss[loss=0.1935, ctc_loss=0.1254, cr_loss=0.3407, over 17296.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1223, cr_loss=0.34, over 3357058.09 frames. ], batch size: 49, lr: 2.83e-03, grad_scale: 8.0 2024-09-25 13:31:52,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=754539.3333333334, ans=0.125 2024-09-25 13:32:12,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=754586.0, ans=0.025 2024-09-25 13:32:12,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2024-09-25 13:32:17,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.72 vs. limit=10.0 2024-09-25 13:32:18,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=754586.0, ans=0.0 2024-09-25 13:32:35,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=754632.6666666666, ans=0.2 2024-09-25 13:32:47,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=754679.3333333334, ans=0.0 2024-09-25 13:32:59,060 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.58 vs. limit=15.0 2024-09-25 13:33:10,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=754726.0, ans=0.125 2024-09-25 13:33:13,476 INFO [train.py:1198] (3/4) Epoch 42, batch 2000, loss[loss=0.1858, ctc_loss=0.1209, cr_loss=0.3245, over 11966.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1219, cr_loss=0.3388, over 3346697.05 frames. ], batch size: 123, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:33:25,999 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.295e+02 1.343e+02 1.422e+02 2.059e+02, threshold=2.687e+02, percent-clipped=0.0 2024-09-25 13:33:35,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=754819.3333333334, ans=0.1 2024-09-25 13:33:44,577 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0 2024-09-25 13:34:36,179 INFO [train.py:1198] (3/4) Epoch 42, batch 2050, loss[loss=0.1755, ctc_loss=0.1117, cr_loss=0.3192, over 17032.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1219, cr_loss=0.3388, over 3351027.55 frames. ], batch size: 44, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:34:38,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=755006.0, ans=0.125 2024-09-25 13:34:41,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=755006.0, ans=0.125 2024-09-25 13:34:46,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=755006.0, ans=0.125 2024-09-25 13:34:50,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=755052.6666666666, ans=0.0 2024-09-25 13:35:08,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=755052.6666666666, ans=0.025 2024-09-25 13:35:13,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=12.0 2024-09-25 13:35:16,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=755099.3333333334, ans=0.07 2024-09-25 13:36:01,518 INFO [train.py:1198] (3/4) Epoch 42, batch 2100, loss[loss=0.1781, ctc_loss=0.1106, cr_loss=0.3378, over 17302.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1213, cr_loss=0.3379, over 3360987.04 frames. ], batch size: 49, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:36:06,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=755239.3333333334, ans=0.04949747468305833 2024-09-25 13:36:13,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=755239.3333333334, ans=0.125 2024-09-25 13:36:14,533 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.293e+02 1.359e+02 1.448e+02 2.316e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 13:36:37,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=755332.6666666666, ans=0.125 2024-09-25 13:37:21,953 INFO [train.py:1198] (3/4) Epoch 42, batch 2150, loss[loss=0.1712, ctc_loss=0.11, cr_loss=0.3059, over 17345.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1217, cr_loss=0.3386, over 3353408.66 frames. ], batch size: 43, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:37:28,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=755472.6666666666, ans=0.125 2024-09-25 13:38:17,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=755612.6666666666, ans=0.09899494936611666 2024-09-25 13:38:44,294 INFO [train.py:1198] (3/4) Epoch 42, batch 2200, loss[loss=0.2296, ctc_loss=0.1503, cr_loss=0.3969, over 16545.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1223, cr_loss=0.3403, over 3352555.20 frames. ], batch size: 66, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:38:56,031 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.82 vs. limit=10.0 2024-09-25 13:38:59,637 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.309e+02 1.387e+02 1.496e+02 2.285e+02, threshold=2.773e+02, percent-clipped=0.0 2024-09-25 13:39:03,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=755752.6666666666, ans=0.1 2024-09-25 13:39:20,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=755799.3333333334, ans=0.0 2024-09-25 13:39:41,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=755846.0, ans=10.0 2024-09-25 13:40:06,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=755892.6666666666, ans=0.1 2024-09-25 13:40:09,780 INFO [train.py:1198] (3/4) Epoch 42, batch 2250, loss[loss=0.2206, ctc_loss=0.1436, cr_loss=0.3854, over 16035.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1224, cr_loss=0.3408, over 3356921.74 frames. ], batch size: 74, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:40:10,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=755939.3333333334, ans=0.125 2024-09-25 13:40:11,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=755939.3333333334, ans=0.1 2024-09-25 13:40:22,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.73 vs. limit=15.0 2024-09-25 13:40:40,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=755986.0, ans=0.125 2024-09-25 13:40:51,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=756032.6666666666, ans=0.125 2024-09-25 13:41:05,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756079.3333333334, ans=0.1 2024-09-25 13:41:14,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=756079.3333333334, ans=0.025 2024-09-25 13:41:23,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=756126.0, ans=0.125 2024-09-25 13:41:25,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=756126.0, ans=0.0 2024-09-25 13:41:33,096 INFO [train.py:1198] (3/4) Epoch 42, batch 2300, loss[loss=0.227, ctc_loss=0.1519, cr_loss=0.3754, over 12012.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1233, cr_loss=0.3424, over 3353663.82 frames. ], batch size: 124, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:41:45,867 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.265e+02 1.352e+02 1.460e+02 1.967e+02, threshold=2.704e+02, percent-clipped=0.0 2024-09-25 13:41:46,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=756172.6666666666, ans=0.0 2024-09-25 13:41:52,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=15.0 2024-09-25 13:41:54,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=756219.3333333334, ans=0.2 2024-09-25 13:41:57,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=756219.3333333334, ans=0.0 2024-09-25 13:42:15,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2024-09-25 13:42:40,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=756359.3333333334, ans=0.07 2024-09-25 13:42:41,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.15 vs. limit=22.5 2024-09-25 13:42:53,054 INFO [train.py:1198] (3/4) Epoch 42, batch 2350, loss[loss=0.1828, ctc_loss=0.1173, cr_loss=0.3276, over 17134.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1233, cr_loss=0.3421, over 3362770.28 frames. ], batch size: 48, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:43:07,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=756406.0, ans=0.125 2024-09-25 13:43:26,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=756499.3333333334, ans=0.0 2024-09-25 13:43:31,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=756499.3333333334, ans=0.0 2024-09-25 13:43:34,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=756499.3333333334, ans=0.05 2024-09-25 13:44:04,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=756592.6666666666, ans=0.2 2024-09-25 13:44:04,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2024-09-25 13:44:13,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=756592.6666666666, ans=0.125 2024-09-25 13:44:18,218 INFO [train.py:1198] (3/4) Epoch 42, batch 2400, loss[loss=0.1924, ctc_loss=0.1236, cr_loss=0.344, over 17024.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1228, cr_loss=0.3413, over 3363277.94 frames. ], batch size: 44, lr: 2.82e-03, grad_scale: 32.0 2024-09-25 13:44:29,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756639.3333333334, ans=0.1 2024-09-25 13:44:30,874 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.304e+02 1.412e+02 1.522e+02 4.391e+02, threshold=2.825e+02, percent-clipped=1.0 2024-09-25 13:44:38,131 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=12.0 2024-09-25 13:44:39,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=756686.0, ans=0.0 2024-09-25 13:45:10,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=756779.3333333334, ans=0.0 2024-09-25 13:45:43,399 INFO [train.py:1198] (3/4) Epoch 42, batch 2450, loss[loss=0.215, ctc_loss=0.1417, cr_loss=0.3667, over 17309.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1224, cr_loss=0.3397, over 3349437.29 frames. ], batch size: 49, lr: 2.82e-03, grad_scale: 32.0 2024-09-25 13:45:45,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=756872.6666666666, ans=0.2 2024-09-25 13:46:09,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=756919.3333333334, ans=0.025 2024-09-25 13:46:21,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=756966.0, ans=0.125 2024-09-25 13:46:22,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=756966.0, ans=0.0 2024-09-25 13:46:32,552 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=22.5 2024-09-25 13:47:03,853 INFO [train.py:1198] (3/4) Epoch 42, batch 2500, loss[loss=0.1914, ctc_loss=0.1203, cr_loss=0.3553, over 17104.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1221, cr_loss=0.339, over 3352289.48 frames. ], batch size: 49, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:47:18,221 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.315e+02 1.378e+02 1.471e+02 2.327e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 13:47:42,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=757199.3333333334, ans=0.125 2024-09-25 13:48:01,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=757246.0, ans=0.125 2024-09-25 13:48:17,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=757292.6666666666, ans=0.1 2024-09-25 13:48:26,434 INFO [train.py:1198] (3/4) Epoch 42, batch 2550, loss[loss=0.1581, ctc_loss=0.09725, cr_loss=0.3041, over 17276.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1223, cr_loss=0.3392, over 3356124.26 frames. ], batch size: 42, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:48:30,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=757339.3333333334, ans=0.125 2024-09-25 13:48:36,999 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2024-09-25 13:49:07,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=757432.6666666666, ans=0.125 2024-09-25 13:49:24,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=757479.3333333334, ans=0.0 2024-09-25 13:49:44,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=757526.0, ans=0.0 2024-09-25 13:49:48,983 INFO [train.py:1198] (3/4) Epoch 42, batch 2600, loss[loss=0.144, ctc_loss=0.08758, cr_loss=0.282, over 17278.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.122, cr_loss=0.3392, over 3360781.96 frames. ], batch size: 42, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:50:05,737 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.054e+02 1.272e+02 1.357e+02 1.427e+02 2.038e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-25 13:50:09,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=757619.3333333334, ans=0.0 2024-09-25 13:50:35,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757666.0, ans=0.1 2024-09-25 13:50:40,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=757712.6666666666, ans=0.125 2024-09-25 13:51:13,651 INFO [train.py:1198] (3/4) Epoch 42, batch 2650, loss[loss=0.2295, ctc_loss=0.1496, cr_loss=0.3994, over 15064.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1218, cr_loss=0.3389, over 3360479.85 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:51:19,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=757806.0, ans=0.1 2024-09-25 13:51:37,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2024-09-25 13:52:11,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=757946.0, ans=0.125 2024-09-25 13:52:11,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=757946.0, ans=0.0 2024-09-25 13:52:15,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-09-25 13:52:26,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=757992.6666666666, ans=0.125 2024-09-25 13:52:34,236 INFO [train.py:1198] (3/4) Epoch 42, batch 2700, loss[loss=0.173, ctc_loss=0.1093, cr_loss=0.3185, over 17172.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1222, cr_loss=0.3396, over 3359439.40 frames. ], batch size: 41, lr: 2.82e-03, grad_scale: 8.0 2024-09-25 13:52:42,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=758039.3333333334, ans=0.125 2024-09-25 13:52:50,044 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.301e+02 1.411e+02 1.546e+02 2.916e+02, threshold=2.822e+02, percent-clipped=1.0 2024-09-25 13:53:08,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=758132.6666666666, ans=0.0 2024-09-25 13:53:09,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2024-09-25 13:53:12,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=758132.6666666666, ans=0.025 2024-09-25 13:53:20,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=758132.6666666666, ans=0.125 2024-09-25 13:53:26,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=758179.3333333334, ans=0.125 2024-09-25 13:53:59,183 INFO [train.py:1198] (3/4) Epoch 42, batch 2750, loss[loss=0.1869, ctc_loss=0.1209, cr_loss=0.33, over 17066.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1223, cr_loss=0.3398, over 3365234.61 frames. ], batch size: 46, lr: 2.82e-03, grad_scale: 8.0 2024-09-25 13:54:40,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=758366.0, ans=0.2 2024-09-25 13:54:53,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=758412.6666666666, ans=0.125 2024-09-25 13:55:23,621 INFO [train.py:1198] (3/4) Epoch 42, batch 2800, loss[loss=0.2017, ctc_loss=0.1305, cr_loss=0.3559, over 17149.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1225, cr_loss=0.3398, over 3358804.61 frames. ], batch size: 48, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:55:39,670 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.295e+02 1.378e+02 1.487e+02 2.331e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 13:55:44,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=758552.6666666666, ans=0.2 2024-09-25 13:55:57,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=758599.3333333334, ans=0.5 2024-09-25 13:56:02,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=758599.3333333334, ans=0.2 2024-09-25 13:56:43,684 INFO [train.py:1198] (3/4) Epoch 42, batch 2850, loss[loss=0.2054, ctc_loss=0.1312, cr_loss=0.3712, over 16832.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1226, cr_loss=0.3406, over 3358048.52 frames. ], batch size: 61, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:57:38,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=758879.3333333334, ans=0.125 2024-09-25 13:57:41,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=758879.3333333334, ans=0.5 2024-09-25 13:58:06,779 INFO [train.py:1198] (3/4) Epoch 42, batch 2900, loss[loss=0.2204, ctc_loss=0.145, cr_loss=0.3771, over 17213.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.123, cr_loss=0.3412, over 3356974.13 frames. ], batch size: 47, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:58:08,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=758972.6666666666, ans=0.125 2024-09-25 13:58:19,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=758972.6666666666, ans=0.125 2024-09-25 13:58:22,533 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.263e+02 1.349e+02 1.466e+02 2.283e+02, threshold=2.697e+02, percent-clipped=0.0 2024-09-25 13:58:46,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=759066.0, ans=0.1 2024-09-25 13:59:04,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=22.5 2024-09-25 13:59:29,386 INFO [train.py:1198] (3/4) Epoch 42, batch 2950, loss[loss=0.2216, ctc_loss=0.1464, cr_loss=0.3758, over 15167.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.123, cr_loss=0.3411, over 3355017.77 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:59:29,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=759206.0, ans=0.05 2024-09-25 13:59:41,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=759206.0, ans=0.0 2024-09-25 13:59:47,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=759252.6666666666, ans=0.125 2024-09-25 14:00:24,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2024-09-25 14:00:32,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=759346.0, ans=0.125 2024-09-25 14:00:48,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=759392.6666666666, ans=0.125 2024-09-25 14:00:53,414 INFO [train.py:1198] (3/4) Epoch 42, batch 3000, loss[loss=0.1675, ctc_loss=0.1078, cr_loss=0.2983, over 17275.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1235, cr_loss=0.342, over 3353399.26 frames. ], batch size: 42, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 14:00:53,414 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 14:01:08,867 INFO [train.py:1230] (3/4) Epoch 42, validation: loss=0.03543, ctc_loss=0.03543, cr_loss=1.019e-14, over 944034.00 frames. 2024-09-25 14:01:08,868 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 14:01:24,613 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.302e+02 1.378e+02 1.459e+02 2.338e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 14:01:44,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=759532.6666666666, ans=0.125 2024-09-25 14:02:26,582 INFO [train.py:1198] (3/4) Epoch 42, batch 3050, loss[loss=0.2109, ctc_loss=0.1368, cr_loss=0.3705, over 16991.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1227, cr_loss=0.3405, over 3356339.56 frames. ], batch size: 53, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 14:02:26,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=759672.6666666666, ans=0.125 2024-09-25 14:02:36,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=759672.6666666666, ans=0.125 2024-09-25 14:03:31,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=759859.3333333334, ans=0.125 2024-09-25 14:03:45,317 INFO [train.py:1198] (3/4) Epoch 42, batch 3100, loss[loss=0.2178, ctc_loss=0.141, cr_loss=0.3841, over 16654.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1219, cr_loss=0.3386, over 3360581.18 frames. ], batch size: 66, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 14:04:00,900 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.293e+02 1.378e+02 1.463e+02 2.447e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 14:04:21,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=759999.3333333334, ans=0.125 2024-09-25 14:04:26,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759999.3333333334, ans=0.1 2024-09-25 14:04:27,831 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 14:04:35,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=760046.0, ans=0.125 2024-09-25 14:04:41,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=760046.0, ans=0.0 2024-09-25 14:04:47,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=760092.6666666666, ans=0.2 2024-09-25 14:04:51,182 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 14:05:02,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=760139.3333333334, ans=0.025 2024-09-25 14:05:02,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2024-09-25 14:05:03,356 INFO [train.py:1198] (3/4) Epoch 42, batch 3150, loss[loss=0.169, ctc_loss=0.1063, cr_loss=0.3133, over 17273.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1223, cr_loss=0.3392, over 3351243.13 frames. ], batch size: 42, lr: 2.81e-03, grad_scale: 16.0 2024-09-25 14:05:19,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=760186.0, ans=0.5 2024-09-25 14:06:05,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=760279.3333333334, ans=0.0 2024-09-25 14:06:08,456 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 14:06:19,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=760326.0, ans=0.125 2024-09-25 14:06:23,523 INFO [train.py:1198] (3/4) Epoch 42, batch 3200, loss[loss=0.2007, ctc_loss=0.1313, cr_loss=0.3468, over 17008.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1221, cr_loss=0.3388, over 3338491.34 frames. ], batch size: 44, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:06:28,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=760372.6666666666, ans=0.125 2024-09-25 14:06:39,251 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.277e+02 1.364e+02 1.425e+02 3.566e+02, threshold=2.728e+02, percent-clipped=1.0 2024-09-25 14:06:41,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=760419.3333333334, ans=0.125 2024-09-25 14:07:21,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.70 vs. limit=15.0 2024-09-25 14:07:37,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=22.5 2024-09-25 14:07:43,981 INFO [train.py:1198] (3/4) Epoch 42, batch 3250, loss[loss=0.1809, ctc_loss=0.1149, cr_loss=0.3301, over 17336.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1217, cr_loss=0.3386, over 3350019.98 frames. ], batch size: 48, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:07:47,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=760606.0, ans=0.5 2024-09-25 14:07:50,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=760606.0, ans=0.1 2024-09-25 14:07:58,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=760652.6666666666, ans=0.125 2024-09-25 14:08:24,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=760699.3333333334, ans=0.0 2024-09-25 14:08:26,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=760699.3333333334, ans=0.125 2024-09-25 14:09:02,229 INFO [train.py:1198] (3/4) Epoch 42, batch 3300, loss[loss=0.208, ctc_loss=0.1337, cr_loss=0.3713, over 17312.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1208, cr_loss=0.337, over 3357495.75 frames. ], batch size: 51, lr: 2.81e-03, grad_scale: 16.0 2024-09-25 14:09:04,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=760839.3333333334, ans=0.125 2024-09-25 14:09:04,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=760839.3333333334, ans=0.09899494936611666 2024-09-25 14:09:19,409 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.292e+02 1.397e+02 1.503e+02 2.274e+02, threshold=2.794e+02, percent-clipped=0.0 2024-09-25 14:09:34,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.76 vs. limit=15.0 2024-09-25 14:09:39,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=760932.6666666666, ans=0.2 2024-09-25 14:09:44,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=760932.6666666666, ans=0.125 2024-09-25 14:09:53,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=760979.3333333334, ans=0.125 2024-09-25 14:10:02,147 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.03 vs. limit=15.0 2024-09-25 14:10:09,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=761026.0, ans=0.125 2024-09-25 14:10:22,242 INFO [train.py:1198] (3/4) Epoch 42, batch 3350, loss[loss=0.1857, ctc_loss=0.1179, cr_loss=0.3389, over 17223.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1215, cr_loss=0.3382, over 3353735.91 frames. ], batch size: 50, lr: 2.81e-03, grad_scale: 16.0 2024-09-25 14:10:22,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=761072.6666666666, ans=0.0 2024-09-25 14:10:32,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=15.0 2024-09-25 14:10:47,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=761119.3333333334, ans=22.5 2024-09-25 14:10:56,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=761166.0, ans=0.125 2024-09-25 14:10:57,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=761166.0, ans=0.125 2024-09-25 14:11:01,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2024-09-25 14:11:11,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=761212.6666666666, ans=0.2 2024-09-25 14:11:16,916 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2024-09-25 14:11:42,879 INFO [train.py:1198] (3/4) Epoch 42, batch 3400, loss[loss=0.1758, ctc_loss=0.1135, cr_loss=0.3114, over 17061.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.121, cr_loss=0.3368, over 3355858.26 frames. ], batch size: 46, lr: 2.81e-03, grad_scale: 16.0 2024-09-25 14:11:49,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=761306.0, ans=0.125 2024-09-25 14:11:49,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=761306.0, ans=0.95 2024-09-25 14:12:00,053 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.312e+02 1.394e+02 1.492e+02 2.078e+02, threshold=2.788e+02, percent-clipped=0.0 2024-09-25 14:12:00,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-09-25 14:12:14,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=761399.3333333334, ans=0.2 2024-09-25 14:12:14,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=761399.3333333334, ans=0.125 2024-09-25 14:12:19,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=761399.3333333334, ans=0.025 2024-09-25 14:12:43,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2024-09-25 14:13:01,389 INFO [train.py:1198] (3/4) Epoch 42, batch 3450, loss[loss=0.1475, ctc_loss=0.09382, cr_loss=0.2683, over 16325.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1203, cr_loss=0.3359, over 3362227.86 frames. ], batch size: 36, lr: 2.81e-03, grad_scale: 16.0 2024-09-25 14:13:09,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=761539.3333333334, ans=0.025 2024-09-25 14:13:28,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=761586.0, ans=0.125 2024-09-25 14:13:38,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.58 vs. limit=15.0 2024-09-25 14:13:45,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=761632.6666666666, ans=0.125 2024-09-25 14:13:51,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=761679.3333333334, ans=0.95 2024-09-25 14:13:52,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=761679.3333333334, ans=0.125 2024-09-25 14:14:19,997 INFO [train.py:1198] (3/4) Epoch 42, batch 3500, loss[loss=0.2166, ctc_loss=0.1469, cr_loss=0.3484, over 12405.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1201, cr_loss=0.3359, over 3364449.43 frames. ], batch size: 123, lr: 2.81e-03, grad_scale: 16.0 2024-09-25 14:14:37,159 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.281e+02 1.384e+02 1.487e+02 1.767e+02, threshold=2.768e+02, percent-clipped=0.0 2024-09-25 14:14:39,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=761819.3333333334, ans=0.0 2024-09-25 14:14:47,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=761819.3333333334, ans=0.0 2024-09-25 14:14:47,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=761819.3333333334, ans=0.0 2024-09-25 14:15:40,265 INFO [train.py:1198] (3/4) Epoch 42, batch 3550, loss[loss=0.1874, ctc_loss=0.1196, cr_loss=0.3393, over 17059.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.12, cr_loss=0.3357, over 3363552.28 frames. ], batch size: 46, lr: 2.81e-03, grad_scale: 16.0 2024-09-25 14:15:42,276 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 14:15:49,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=762006.0, ans=0.0 2024-09-25 14:16:10,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=762099.3333333334, ans=0.125 2024-09-25 14:16:35,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=12.0 2024-09-25 14:16:58,305 INFO [train.py:1198] (3/4) Epoch 42, batch 3600, loss[loss=0.1992, ctc_loss=0.1278, cr_loss=0.3573, over 17063.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1199, cr_loss=0.3352, over 3356030.37 frames. ], batch size: 56, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:17:09,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2024-09-25 14:17:10,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=762239.3333333334, ans=0.5 2024-09-25 14:17:15,186 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.282e+02 1.377e+02 1.493e+02 1.761e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-25 14:17:50,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.09 vs. limit=15.0 2024-09-25 14:18:10,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=762426.0, ans=0.0 2024-09-25 14:18:17,995 INFO [train.py:1198] (3/4) Epoch 42, batch 3650, loss[loss=0.2044, ctc_loss=0.1329, cr_loss=0.3576, over 17222.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1203, cr_loss=0.3358, over 3360729.05 frames. ], batch size: 50, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:18:27,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=762472.6666666666, ans=0.125 2024-09-25 14:19:04,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=762612.6666666666, ans=0.125 2024-09-25 14:19:07,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=762612.6666666666, ans=0.0 2024-09-25 14:19:26,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=762659.3333333334, ans=0.2 2024-09-25 14:19:38,302 INFO [train.py:1198] (3/4) Epoch 42, batch 3700, loss[loss=0.1734, ctc_loss=0.109, cr_loss=0.3223, over 17023.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.12, cr_loss=0.3356, over 3360985.19 frames. ], batch size: 44, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:19:43,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=762706.0, ans=0.07 2024-09-25 14:19:44,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=762706.0, ans=0.125 2024-09-25 14:19:51,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=762706.0, ans=0.2 2024-09-25 14:19:55,540 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.181e+02 1.273e+02 1.387e+02 1.507e+02 1.911e+02, threshold=2.774e+02, percent-clipped=0.0 2024-09-25 14:20:28,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=762846.0, ans=0.125 2024-09-25 14:20:39,154 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.598e-03 2024-09-25 14:20:41,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=762892.6666666666, ans=0.125 2024-09-25 14:20:57,615 INFO [train.py:1198] (3/4) Epoch 42, batch 3750, loss[loss=0.1478, ctc_loss=0.092, cr_loss=0.2789, over 17175.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.121, cr_loss=0.3376, over 3359253.47 frames. ], batch size: 41, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:21:05,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=762939.3333333334, ans=0.0 2024-09-25 14:21:16,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=762986.0, ans=0.125 2024-09-25 14:21:17,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=762986.0, ans=0.2 2024-09-25 14:21:33,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=763032.6666666666, ans=0.0 2024-09-25 14:21:48,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.05 vs. limit=22.5 2024-09-25 14:21:54,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763079.3333333334, ans=0.1 2024-09-25 14:22:09,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=763126.0, ans=0.0 2024-09-25 14:22:15,990 INFO [train.py:1198] (3/4) Epoch 42, batch 3800, loss[loss=0.2153, ctc_loss=0.1416, cr_loss=0.3685, over 14917.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1212, cr_loss=0.3376, over 3353525.95 frames. ], batch size: 89, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:22:21,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.03 vs. limit=10.0 2024-09-25 14:22:33,279 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.315e+02 1.408e+02 1.487e+02 2.777e+02, threshold=2.816e+02, percent-clipped=1.0 2024-09-25 14:22:42,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=763219.3333333334, ans=0.04949747468305833 2024-09-25 14:23:34,532 INFO [train.py:1198] (3/4) Epoch 42, batch 3850, loss[loss=0.1678, ctc_loss=0.1031, cr_loss=0.3234, over 17022.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1204, cr_loss=0.3356, over 3315216.37 frames. ], batch size: 39, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:24:31,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2024-09-25 14:25:29,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.74 vs. limit=12.0 2024-09-25 14:25:36,253 INFO [train.py:1198] (3/4) Epoch 43, batch 0, loss[loss=0.1569, ctc_loss=0.1001, cr_loss=0.284, over 17103.00 frames. ], tot_loss[loss=0.1569, ctc_loss=0.1001, cr_loss=0.284, over 17103.00 frames. ], batch size: 43, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:25:36,254 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 14:25:51,487 INFO [train.py:1230] (3/4) Epoch 43, validation: loss=0.03486, ctc_loss=0.03486, cr_loss=1.051e-14, over 944034.00 frames. 2024-09-25 14:25:51,488 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 14:25:51,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763620.6666666666, ans=0.1 2024-09-25 14:25:58,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=763620.6666666666, ans=0.125 2024-09-25 14:26:02,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=763620.6666666666, ans=0.0 2024-09-25 14:26:15,238 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.334e+02 1.498e+02 1.673e+02 2.107e+02, threshold=2.995e+02, percent-clipped=0.0 2024-09-25 14:26:42,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=763760.6666666666, ans=0.2 2024-09-25 14:27:10,607 INFO [train.py:1198] (3/4) Epoch 43, batch 50, loss[loss=0.1746, ctc_loss=0.1092, cr_loss=0.327, over 17043.00 frames. ], tot_loss[loss=0.1836, ctc_loss=0.1173, cr_loss=0.3318, over 762146.99 frames. ], batch size: 52, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:27:28,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=763900.6666666666, ans=10.0 2024-09-25 14:27:32,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2024-09-25 14:27:33,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=763900.6666666666, ans=10.0 2024-09-25 14:27:57,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=763994.0, ans=0.0 2024-09-25 14:28:27,441 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 14:28:30,180 INFO [train.py:1198] (3/4) Epoch 43, batch 100, loss[loss=0.1915, ctc_loss=0.1242, cr_loss=0.3367, over 17112.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1192, cr_loss=0.3333, over 1328457.79 frames. ], batch size: 49, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:28:43,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=12.0 2024-09-25 14:28:54,052 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.307e+02 1.401e+02 1.473e+02 2.012e+02, threshold=2.802e+02, percent-clipped=0.0 2024-09-25 14:28:59,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=764134.0, ans=0.125 2024-09-25 14:29:05,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=764180.6666666666, ans=0.0 2024-09-25 14:29:13,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=764180.6666666666, ans=0.02 2024-09-25 14:29:54,511 INFO [train.py:1198] (3/4) Epoch 43, batch 150, loss[loss=0.1575, ctc_loss=0.09834, cr_loss=0.2959, over 17108.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.121, cr_loss=0.3353, over 1773174.11 frames. ], batch size: 40, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:30:12,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.24 vs. limit=15.0 2024-09-25 14:30:15,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=764367.3333333334, ans=0.0 2024-09-25 14:30:18,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=764367.3333333334, ans=0.1 2024-09-25 14:30:25,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=764414.0, ans=0.125 2024-09-25 14:30:25,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=764414.0, ans=0.125 2024-09-25 14:30:27,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=764414.0, ans=0.2 2024-09-25 14:30:31,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=764414.0, ans=0.0 2024-09-25 14:31:02,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=764507.3333333334, ans=0.95 2024-09-25 14:31:07,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=764507.3333333334, ans=0.04949747468305833 2024-09-25 14:31:19,819 INFO [train.py:1198] (3/4) Epoch 43, batch 200, loss[loss=0.2341, ctc_loss=0.159, cr_loss=0.3758, over 11877.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1216, cr_loss=0.3377, over 2123894.71 frames. ], batch size: 123, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:31:28,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=764554.0, ans=0.125 2024-09-25 14:31:32,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=764554.0, ans=0.2 2024-09-25 14:31:42,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=764600.6666666666, ans=0.125 2024-09-25 14:31:43,640 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.159e+02 1.287e+02 1.384e+02 1.479e+02 1.740e+02, threshold=2.769e+02, percent-clipped=0.0 2024-09-25 14:31:50,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=764647.3333333334, ans=0.125 2024-09-25 14:32:33,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=764740.6666666666, ans=0.125 2024-09-25 14:32:39,328 INFO [train.py:1198] (3/4) Epoch 43, batch 250, loss[loss=0.1858, ctc_loss=0.1166, cr_loss=0.346, over 17297.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1209, cr_loss=0.3366, over 2399752.48 frames. ], batch size: 49, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:32:39,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=764787.3333333334, ans=0.07 2024-09-25 14:33:10,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=12.0 2024-09-25 14:33:21,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=764880.6666666666, ans=0.025 2024-09-25 14:33:29,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=764927.3333333334, ans=0.0 2024-09-25 14:33:37,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=764927.3333333334, ans=0.0 2024-09-25 14:33:59,385 INFO [train.py:1198] (3/4) Epoch 43, batch 300, loss[loss=0.2395, ctc_loss=0.1572, cr_loss=0.4117, over 14852.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1217, cr_loss=0.3383, over 2614425.41 frames. ], batch size: 89, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:34:16,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=765067.3333333334, ans=0.0 2024-09-25 14:34:25,901 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.299e+02 1.370e+02 1.451e+02 2.001e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-25 14:34:57,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=765160.6666666666, ans=0.125 2024-09-25 14:35:17,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.48 vs. limit=15.0 2024-09-25 14:35:23,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2024-09-25 14:35:24,627 INFO [train.py:1198] (3/4) Epoch 43, batch 350, loss[loss=0.189, ctc_loss=0.1202, cr_loss=0.3438, over 15406.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1211, cr_loss=0.3367, over 2778153.91 frames. ], batch size: 89, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:35:34,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=765254.0, ans=10.0 2024-09-25 14:36:12,475 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 14:36:14,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=765347.3333333334, ans=0.1 2024-09-25 14:36:47,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=765440.6666666666, ans=0.0 2024-09-25 14:36:51,771 INFO [train.py:1198] (3/4) Epoch 43, batch 400, loss[loss=0.2199, ctc_loss=0.1398, cr_loss=0.4007, over 17226.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1217, cr_loss=0.3377, over 2899497.77 frames. ], batch size: 55, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:37:15,438 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.295e+02 1.341e+02 1.449e+02 2.336e+02, threshold=2.681e+02, percent-clipped=0.0 2024-09-25 14:37:25,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=765580.6666666666, ans=0.07 2024-09-25 14:37:28,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=765580.6666666666, ans=0.125 2024-09-25 14:37:31,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=765580.6666666666, ans=0.125 2024-09-25 14:37:39,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=765627.3333333334, ans=0.1 2024-09-25 14:37:55,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=765674.0, ans=0.0 2024-09-25 14:38:11,329 INFO [train.py:1198] (3/4) Epoch 43, batch 450, loss[loss=0.1918, ctc_loss=0.1227, cr_loss=0.3455, over 17317.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1208, cr_loss=0.336, over 3007472.28 frames. ], batch size: 51, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:38:24,704 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2024-09-25 14:38:27,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=765767.3333333334, ans=0.2 2024-09-25 14:38:40,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765767.3333333334, ans=0.1 2024-09-25 14:38:56,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=765814.0, ans=0.125 2024-09-25 14:38:56,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=765814.0, ans=0.125 2024-09-25 14:39:00,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=765860.6666666666, ans=0.0 2024-09-25 14:39:05,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=765860.6666666666, ans=0.09899494936611666 2024-09-25 14:39:30,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=765907.3333333334, ans=0.025 2024-09-25 14:39:33,566 INFO [train.py:1198] (3/4) Epoch 43, batch 500, loss[loss=0.1548, ctc_loss=0.09449, cr_loss=0.3014, over 17294.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1215, cr_loss=0.3381, over 3084583.58 frames. ], batch size: 42, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:40:00,059 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.174e+02 1.354e+02 1.449e+02 1.544e+02 3.228e+02, threshold=2.898e+02, percent-clipped=1.0 2024-09-25 14:40:06,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=766047.3333333334, ans=0.125 2024-09-25 14:40:17,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.65 vs. limit=15.0 2024-09-25 14:40:38,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=766140.6666666666, ans=0.025 2024-09-25 14:40:59,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=766187.3333333334, ans=0.125 2024-09-25 14:41:00,790 INFO [train.py:1198] (3/4) Epoch 43, batch 550, loss[loss=0.1678, ctc_loss=0.1068, cr_loss=0.3049, over 17260.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1217, cr_loss=0.3379, over 3146841.49 frames. ], batch size: 44, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:41:04,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=766187.3333333334, ans=0.2 2024-09-25 14:41:07,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=766187.3333333334, ans=0.125 2024-09-25 14:41:07,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=766187.3333333334, ans=0.125 2024-09-25 14:41:24,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=766234.0, ans=0.0 2024-09-25 14:42:14,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=766374.0, ans=0.125 2024-09-25 14:42:20,464 INFO [train.py:1198] (3/4) Epoch 43, batch 600, loss[loss=0.2217, ctc_loss=0.144, cr_loss=0.3885, over 16992.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1217, cr_loss=0.3382, over 3200380.07 frames. ], batch size: 53, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:42:33,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.30 vs. limit=15.0 2024-09-25 14:42:44,284 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.302e+02 1.423e+02 1.505e+02 3.409e+02, threshold=2.845e+02, percent-clipped=1.0 2024-09-25 14:43:02,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=766514.0, ans=0.125 2024-09-25 14:43:40,699 INFO [train.py:1198] (3/4) Epoch 43, batch 650, loss[loss=0.2159, ctc_loss=0.1462, cr_loss=0.3487, over 11694.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1209, cr_loss=0.3362, over 3237764.97 frames. ], batch size: 123, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:43:40,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=766654.0, ans=0.2 2024-09-25 14:44:05,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=766700.6666666666, ans=0.0 2024-09-25 14:44:27,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=766747.3333333334, ans=0.0 2024-09-25 14:44:35,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=766794.0, ans=0.125 2024-09-25 14:44:37,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=12.0 2024-09-25 14:44:50,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=766840.6666666666, ans=0.0 2024-09-25 14:44:54,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=766840.6666666666, ans=0.125 2024-09-25 14:45:03,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=766840.6666666666, ans=0.0 2024-09-25 14:45:06,414 INFO [train.py:1198] (3/4) Epoch 43, batch 700, loss[loss=0.1861, ctc_loss=0.1226, cr_loss=0.3174, over 17308.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1213, cr_loss=0.3373, over 3274550.82 frames. ], batch size: 51, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:45:16,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=766887.3333333334, ans=0.025 2024-09-25 14:45:30,416 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.308e+02 1.372e+02 1.490e+02 1.932e+02, threshold=2.743e+02, percent-clipped=0.0 2024-09-25 14:45:40,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=766980.6666666666, ans=0.0 2024-09-25 14:46:00,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=767027.3333333334, ans=0.0 2024-09-25 14:46:20,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.20 vs. limit=8.0 2024-09-25 14:46:24,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=767074.0, ans=0.0 2024-09-25 14:46:29,263 INFO [train.py:1198] (3/4) Epoch 43, batch 750, loss[loss=0.1928, ctc_loss=0.1251, cr_loss=0.3387, over 16009.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1213, cr_loss=0.3379, over 3302440.35 frames. ], batch size: 74, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:46:58,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=767167.3333333334, ans=0.0 2024-09-25 14:47:00,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=767214.0, ans=15.0 2024-09-25 14:47:17,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=767260.6666666666, ans=0.2 2024-09-25 14:47:32,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2024-09-25 14:47:35,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.92 vs. limit=15.0 2024-09-25 14:47:37,283 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.94 vs. limit=15.0 2024-09-25 14:47:48,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2024-09-25 14:47:49,079 INFO [train.py:1198] (3/4) Epoch 43, batch 800, loss[loss=0.1822, ctc_loss=0.1165, cr_loss=0.3284, over 17099.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1212, cr_loss=0.3382, over 3322173.34 frames. ], batch size: 49, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:48:12,578 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.303e+02 1.387e+02 1.475e+02 2.331e+02, threshold=2.774e+02, percent-clipped=0.0 2024-09-25 14:48:20,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=767447.3333333334, ans=0.125 2024-09-25 14:48:27,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=767447.3333333334, ans=0.1 2024-09-25 14:48:30,935 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.44 vs. limit=6.0 2024-09-25 14:48:36,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=767494.0, ans=0.0 2024-09-25 14:48:45,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-25 14:49:08,216 INFO [train.py:1198] (3/4) Epoch 43, batch 850, loss[loss=0.1687, ctc_loss=0.1066, cr_loss=0.3105, over 16328.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1216, cr_loss=0.3385, over 3330986.44 frames. ], batch size: 36, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:49:09,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-09-25 14:49:26,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.34 vs. limit=6.0 2024-09-25 14:49:49,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=767680.6666666666, ans=0.025 2024-09-25 14:49:56,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.74 vs. limit=15.0 2024-09-25 14:50:00,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=767727.3333333334, ans=0.025 2024-09-25 14:50:08,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=767727.3333333334, ans=0.125 2024-09-25 14:50:08,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=767727.3333333334, ans=0.2 2024-09-25 14:50:23,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=767774.0, ans=0.125 2024-09-25 14:50:32,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=767820.6666666666, ans=0.1 2024-09-25 14:50:33,650 INFO [train.py:1198] (3/4) Epoch 43, batch 900, loss[loss=0.1435, ctc_loss=0.08901, cr_loss=0.2726, over 17196.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1213, cr_loss=0.3383, over 3343881.01 frames. ], batch size: 41, lr: 2.77e-03, grad_scale: 16.0 2024-09-25 14:50:55,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=767867.3333333334, ans=0.125 2024-09-25 14:51:04,268 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.318e+02 1.401e+02 1.527e+02 2.167e+02, threshold=2.803e+02, percent-clipped=0.0 2024-09-25 14:51:06,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=767867.3333333334, ans=0.125 2024-09-25 14:51:15,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=767914.0, ans=0.125 2024-09-25 14:51:31,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=767960.6666666666, ans=0.0 2024-09-25 14:51:36,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=767960.6666666666, ans=0.125 2024-09-25 14:51:47,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=768007.3333333334, ans=0.2 2024-09-25 14:51:55,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=768007.3333333334, ans=0.0 2024-09-25 14:51:58,511 INFO [train.py:1198] (3/4) Epoch 43, batch 950, loss[loss=0.1501, ctc_loss=0.09683, cr_loss=0.2662, over 17195.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1211, cr_loss=0.3373, over 3348865.44 frames. ], batch size: 41, lr: 2.77e-03, grad_scale: 16.0 2024-09-25 14:52:35,978 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 14:52:58,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2024-09-25 14:53:18,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.22 vs. limit=10.0 2024-09-25 14:53:18,818 INFO [train.py:1198] (3/4) Epoch 43, batch 1000, loss[loss=0.1916, ctc_loss=0.1225, cr_loss=0.3456, over 17209.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1208, cr_loss=0.3375, over 3361516.83 frames. ], batch size: 55, lr: 2.77e-03, grad_scale: 16.0 2024-09-25 14:53:30,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=768287.3333333334, ans=0.0 2024-09-25 14:53:33,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=768334.0, ans=0.0 2024-09-25 14:53:44,360 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.294e+02 1.416e+02 1.489e+02 2.363e+02, threshold=2.832e+02, percent-clipped=0.0 2024-09-25 14:53:54,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.27 vs. limit=15.0 2024-09-25 14:54:05,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=768427.3333333334, ans=0.0 2024-09-25 14:54:44,014 INFO [train.py:1198] (3/4) Epoch 43, batch 1050, loss[loss=0.2127, ctc_loss=0.1381, cr_loss=0.373, over 16990.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1205, cr_loss=0.337, over 3362305.39 frames. ], batch size: 53, lr: 2.77e-03, grad_scale: 16.0 2024-09-25 14:54:45,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=768520.6666666666, ans=0.125 2024-09-25 14:54:52,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.02 vs. limit=10.0 2024-09-25 14:55:21,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=768614.0, ans=0.0 2024-09-25 14:56:08,919 INFO [train.py:1198] (3/4) Epoch 43, batch 1100, loss[loss=0.1666, ctc_loss=0.1072, cr_loss=0.2972, over 17067.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.121, cr_loss=0.3386, over 3368764.35 frames. ], batch size: 43, lr: 2.77e-03, grad_scale: 16.0 2024-09-25 14:56:21,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=768754.0, ans=0.125 2024-09-25 14:56:33,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=768800.6666666666, ans=0.125 2024-09-25 14:56:34,424 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.288e+02 1.354e+02 1.479e+02 2.223e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-25 14:56:36,722 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.74 vs. limit=6.0 2024-09-25 14:56:54,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=768847.3333333334, ans=0.1 2024-09-25 14:56:55,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=768894.0, ans=10.0 2024-09-25 14:57:03,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768894.0, ans=0.1 2024-09-25 14:57:15,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=768940.6666666666, ans=0.125 2024-09-25 14:57:28,426 INFO [train.py:1198] (3/4) Epoch 43, batch 1150, loss[loss=0.1587, ctc_loss=0.1022, cr_loss=0.2826, over 17026.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1203, cr_loss=0.3377, over 3373004.52 frames. ], batch size: 39, lr: 2.77e-03, grad_scale: 16.0 2024-09-25 14:57:30,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=768987.3333333334, ans=0.0 2024-09-25 14:57:57,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=769034.0, ans=0.125 2024-09-25 14:58:01,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=769080.6666666666, ans=0.125 2024-09-25 14:58:18,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=769127.3333333334, ans=0.125 2024-09-25 14:58:18,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=769127.3333333334, ans=0.0 2024-09-25 14:58:36,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=769174.0, ans=0.0 2024-09-25 14:58:45,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=769174.0, ans=0.035 2024-09-25 14:58:48,785 INFO [train.py:1198] (3/4) Epoch 43, batch 1200, loss[loss=0.1869, ctc_loss=0.121, cr_loss=0.3295, over 16568.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1204, cr_loss=0.3373, over 3360209.21 frames. ], batch size: 66, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 14:58:57,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2024-09-25 14:59:14,326 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 1.304e+02 1.385e+02 1.497e+02 1.972e+02, threshold=2.770e+02, percent-clipped=0.0 2024-09-25 14:59:28,273 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 14:59:53,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2024-09-25 15:00:13,440 INFO [train.py:1198] (3/4) Epoch 43, batch 1250, loss[loss=0.1955, ctc_loss=0.1235, cr_loss=0.3601, over 17142.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1204, cr_loss=0.3374, over 3359096.66 frames. ], batch size: 48, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:00:34,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=769500.6666666666, ans=0.2 2024-09-25 15:00:34,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-09-25 15:01:04,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2024-09-25 15:01:26,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.10 vs. limit=12.0 2024-09-25 15:01:37,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=769687.3333333334, ans=0.1 2024-09-25 15:01:38,704 INFO [train.py:1198] (3/4) Epoch 43, batch 1300, loss[loss=0.18, ctc_loss=0.1149, cr_loss=0.3254, over 17309.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1197, cr_loss=0.3358, over 3349501.20 frames. ], batch size: 51, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:01:51,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=769687.3333333334, ans=0.1 2024-09-25 15:01:54,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=769734.0, ans=0.0 2024-09-25 15:02:04,027 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.298e+02 1.391e+02 1.472e+02 1.717e+02, threshold=2.781e+02, percent-clipped=0.0 2024-09-25 15:02:04,793 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=22.5 2024-09-25 15:02:18,007 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-25 15:02:34,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=769827.3333333334, ans=0.2 2024-09-25 15:02:58,580 INFO [train.py:1198] (3/4) Epoch 43, batch 1350, loss[loss=0.234, ctc_loss=0.1554, cr_loss=0.3929, over 15272.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.121, cr_loss=0.338, over 3344183.91 frames. ], batch size: 89, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:03:41,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=770014.0, ans=0.2 2024-09-25 15:04:00,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=770060.6666666666, ans=0.125 2024-09-25 15:04:00,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=15.0 2024-09-25 15:04:05,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=22.5 2024-09-25 15:04:17,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=770107.3333333334, ans=0.09899494936611666 2024-09-25 15:04:21,714 INFO [train.py:1198] (3/4) Epoch 43, batch 1400, loss[loss=0.1714, ctc_loss=0.1095, cr_loss=0.3094, over 16972.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1208, cr_loss=0.3376, over 3344917.71 frames. ], batch size: 58, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:04:50,144 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.179e+02 1.309e+02 1.380e+02 1.484e+02 2.530e+02, threshold=2.759e+02, percent-clipped=0.0 2024-09-25 15:05:00,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=770247.3333333334, ans=0.0 2024-09-25 15:05:11,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=15.0 2024-09-25 15:05:24,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.71 vs. limit=15.0 2024-09-25 15:05:45,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=770387.3333333334, ans=0.125 2024-09-25 15:05:46,979 INFO [train.py:1198] (3/4) Epoch 43, batch 1450, loss[loss=0.2143, ctc_loss=0.142, cr_loss=0.3616, over 16457.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1214, cr_loss=0.3387, over 3349988.11 frames. ], batch size: 66, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:06:29,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=770480.6666666666, ans=0.125 2024-09-25 15:06:47,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=770527.3333333334, ans=0.025 2024-09-25 15:07:09,744 INFO [train.py:1198] (3/4) Epoch 43, batch 1500, loss[loss=0.2085, ctc_loss=0.1322, cr_loss=0.3817, over 17209.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.121, cr_loss=0.3381, over 3350819.91 frames. ], batch size: 50, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:07:15,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.74 vs. limit=10.0 2024-09-25 15:07:23,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2024-09-25 15:07:24,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=770667.3333333334, ans=0.025 2024-09-25 15:07:35,089 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.284e+02 1.350e+02 1.419e+02 1.777e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-25 15:07:59,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=770760.6666666666, ans=0.125 2024-09-25 15:08:07,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=770760.6666666666, ans=0.125 2024-09-25 15:08:15,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=770807.3333333334, ans=0.2 2024-09-25 15:08:28,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=770854.0, ans=0.0 2024-09-25 15:08:29,387 INFO [train.py:1198] (3/4) Epoch 43, batch 1550, loss[loss=0.1822, ctc_loss=0.1164, cr_loss=0.3289, over 16983.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.121, cr_loss=0.3379, over 3353716.92 frames. ], batch size: 56, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:09:04,914 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 15:09:09,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=770947.3333333334, ans=0.125 2024-09-25 15:09:30,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=770994.0, ans=0.125 2024-09-25 15:09:40,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-09-25 15:09:54,493 INFO [train.py:1198] (3/4) Epoch 43, batch 1600, loss[loss=0.201, ctc_loss=0.1292, cr_loss=0.3592, over 17003.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1212, cr_loss=0.3385, over 3348416.45 frames. ], batch size: 51, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:09:58,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=771087.3333333334, ans=0.0 2024-09-25 15:10:20,130 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.278e+02 1.396e+02 1.510e+02 2.224e+02, threshold=2.791e+02, percent-clipped=0.0 2024-09-25 15:10:30,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=771180.6666666666, ans=0.125 2024-09-25 15:10:40,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=771180.6666666666, ans=0.0 2024-09-25 15:10:52,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=771227.3333333334, ans=0.2 2024-09-25 15:11:04,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771274.0, ans=0.1 2024-09-25 15:11:19,902 INFO [train.py:1198] (3/4) Epoch 43, batch 1650, loss[loss=0.1923, ctc_loss=0.122, cr_loss=0.3515, over 17108.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1212, cr_loss=0.3388, over 3356642.65 frames. ], batch size: 43, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:11:31,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=771320.6666666666, ans=0.2 2024-09-25 15:12:00,314 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 15:12:19,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=771460.6666666666, ans=0.2 2024-09-25 15:12:24,288 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.70 vs. limit=15.0 2024-09-25 15:12:39,452 INFO [train.py:1198] (3/4) Epoch 43, batch 1700, loss[loss=0.1798, ctc_loss=0.1129, cr_loss=0.3341, over 17041.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.121, cr_loss=0.3385, over 3362697.94 frames. ], batch size: 39, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:12:54,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=771600.6666666666, ans=0.125 2024-09-25 15:13:05,045 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.300e+02 1.382e+02 1.486e+02 1.912e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-25 15:13:10,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=771647.3333333334, ans=0.025 2024-09-25 15:13:33,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.14 vs. limit=12.0 2024-09-25 15:13:34,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=22.5 2024-09-25 15:13:52,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=771740.6666666666, ans=0.5 2024-09-25 15:14:00,170 INFO [train.py:1198] (3/4) Epoch 43, batch 1750, loss[loss=0.2034, ctc_loss=0.1311, cr_loss=0.3614, over 17095.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1218, cr_loss=0.3402, over 3369358.73 frames. ], batch size: 49, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:14:12,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=771787.3333333334, ans=0.2 2024-09-25 15:14:28,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=771834.0, ans=0.0 2024-09-25 15:15:06,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=771927.3333333334, ans=0.2 2024-09-25 15:15:07,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=771974.0, ans=0.125 2024-09-25 15:15:24,847 INFO [train.py:1198] (3/4) Epoch 43, batch 1800, loss[loss=0.1987, ctc_loss=0.1252, cr_loss=0.3673, over 17063.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.121, cr_loss=0.3385, over 3365744.84 frames. ], batch size: 46, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:15:33,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.16 vs. limit=6.0 2024-09-25 15:15:34,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=772020.6666666666, ans=0.125 2024-09-25 15:15:46,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=772067.3333333334, ans=0.0 2024-09-25 15:15:55,551 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.211e+02 1.295e+02 1.383e+02 1.459e+02 2.589e+02, threshold=2.767e+02, percent-clipped=0.0 2024-09-25 15:16:27,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=772160.6666666666, ans=0.125 2024-09-25 15:16:30,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=772160.6666666666, ans=0.0 2024-09-25 15:16:41,251 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2024-09-25 15:16:49,942 INFO [train.py:1198] (3/4) Epoch 43, batch 1850, loss[loss=0.1916, ctc_loss=0.1221, cr_loss=0.3474, over 17322.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1207, cr_loss=0.3381, over 3374820.36 frames. ], batch size: 46, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:17:03,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=772254.0, ans=0.2 2024-09-25 15:17:19,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.33 vs. limit=15.0 2024-09-25 15:17:36,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=772394.0, ans=0.125 2024-09-25 15:17:38,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=772394.0, ans=0.0 2024-09-25 15:17:45,620 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.71 vs. limit=12.0 2024-09-25 15:17:54,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=772440.6666666666, ans=0.125 2024-09-25 15:18:05,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=772440.6666666666, ans=0.025 2024-09-25 15:18:10,615 INFO [train.py:1198] (3/4) Epoch 43, batch 1900, loss[loss=0.187, ctc_loss=0.1204, cr_loss=0.3334, over 17310.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1208, cr_loss=0.3386, over 3380793.82 frames. ], batch size: 46, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:18:18,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=772487.3333333334, ans=0.125 2024-09-25 15:18:33,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=772534.0, ans=0.125 2024-09-25 15:18:36,192 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.296e+02 1.386e+02 1.467e+02 1.936e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 15:18:47,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=772580.6666666666, ans=0.2 2024-09-25 15:19:35,530 INFO [train.py:1198] (3/4) Epoch 43, batch 1950, loss[loss=0.1991, ctc_loss=0.1281, cr_loss=0.3552, over 17209.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1214, cr_loss=0.3385, over 3374485.58 frames. ], batch size: 55, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:20:08,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=772814.0, ans=0.0 2024-09-25 15:20:30,535 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=49.33 vs. limit=15.0 2024-09-25 15:20:36,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=772860.6666666666, ans=0.2 2024-09-25 15:21:00,969 INFO [train.py:1198] (3/4) Epoch 43, batch 2000, loss[loss=0.1393, ctc_loss=0.08558, cr_loss=0.2686, over 16932.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1213, cr_loss=0.3383, over 3373505.81 frames. ], batch size: 42, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:21:01,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2024-09-25 15:21:22,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-09-25 15:21:26,607 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.269e+02 1.346e+02 1.449e+02 2.025e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-25 15:21:33,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=773047.3333333334, ans=0.0 2024-09-25 15:21:41,932 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2024-09-25 15:22:10,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=773140.6666666666, ans=0.1 2024-09-25 15:22:12,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.62 vs. limit=12.0 2024-09-25 15:22:20,776 INFO [train.py:1198] (3/4) Epoch 43, batch 2050, loss[loss=0.2326, ctc_loss=0.1528, cr_loss=0.3987, over 15089.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1219, cr_loss=0.3389, over 3359749.17 frames. ], batch size: 89, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:22:37,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=773234.0, ans=0.0 2024-09-25 15:22:46,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=773234.0, ans=0.0 2024-09-25 15:22:49,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=773234.0, ans=0.125 2024-09-25 15:22:51,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=773280.6666666666, ans=0.0 2024-09-25 15:23:37,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=773374.0, ans=0.2 2024-09-25 15:23:40,845 INFO [train.py:1198] (3/4) Epoch 43, batch 2100, loss[loss=0.2039, ctc_loss=0.1342, cr_loss=0.3487, over 16708.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1219, cr_loss=0.3393, over 3365972.89 frames. ], batch size: 61, lr: 2.76e-03, grad_scale: 16.0 2024-09-25 15:23:49,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=12.0 2024-09-25 15:23:55,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773467.3333333334, ans=0.1 2024-09-25 15:24:09,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=773467.3333333334, ans=0.0 2024-09-25 15:24:10,503 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.257e+02 1.347e+02 1.435e+02 1.926e+02, threshold=2.695e+02, percent-clipped=0.0 2024-09-25 15:24:12,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=773467.3333333334, ans=0.125 2024-09-25 15:24:22,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=773514.0, ans=0.125 2024-09-25 15:25:04,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=773654.0, ans=0.2 2024-09-25 15:25:05,360 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.92 vs. limit=12.0 2024-09-25 15:25:05,723 INFO [train.py:1198] (3/4) Epoch 43, batch 2150, loss[loss=0.1717, ctc_loss=0.1095, cr_loss=0.3107, over 17221.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1219, cr_loss=0.3399, over 3373222.76 frames. ], batch size: 50, lr: 2.76e-03, grad_scale: 16.0 2024-09-25 15:25:17,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=773654.0, ans=0.125 2024-09-25 15:25:41,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=773747.3333333334, ans=0.0 2024-09-25 15:26:15,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=773840.6666666666, ans=0.0 2024-09-25 15:26:31,354 INFO [train.py:1198] (3/4) Epoch 43, batch 2200, loss[loss=0.1821, ctc_loss=0.1173, cr_loss=0.3239, over 17219.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1217, cr_loss=0.3394, over 3378949.05 frames. ], batch size: 50, lr: 2.76e-03, grad_scale: 16.0 2024-09-25 15:26:58,330 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.324e+02 1.413e+02 1.558e+02 2.550e+02, threshold=2.826e+02, percent-clipped=0.0 2024-09-25 15:27:31,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=774027.3333333334, ans=0.125 2024-09-25 15:27:46,502 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 15:27:50,923 INFO [train.py:1198] (3/4) Epoch 43, batch 2250, loss[loss=0.1821, ctc_loss=0.1177, cr_loss=0.322, over 17216.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1219, cr_loss=0.3393, over 3373617.14 frames. ], batch size: 55, lr: 2.76e-03, grad_scale: 16.0 2024-09-25 15:27:52,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=774120.6666666666, ans=0.125 2024-09-25 15:28:05,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=774167.3333333334, ans=0.125 2024-09-25 15:28:08,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=774167.3333333334, ans=10.0 2024-09-25 15:28:15,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2024-09-25 15:28:29,846 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 15:28:39,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=774260.6666666666, ans=0.125 2024-09-25 15:29:00,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=774307.3333333334, ans=0.1 2024-09-25 15:29:13,566 INFO [train.py:1198] (3/4) Epoch 43, batch 2300, loss[loss=0.1709, ctc_loss=0.108, cr_loss=0.3142, over 17023.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1217, cr_loss=0.3387, over 3373755.47 frames. ], batch size: 44, lr: 2.76e-03, grad_scale: 16.0 2024-09-25 15:29:30,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=774400.6666666666, ans=0.125 2024-09-25 15:29:35,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=774400.6666666666, ans=0.0 2024-09-25 15:29:42,935 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.176e+02 1.301e+02 1.372e+02 1.481e+02 2.158e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-25 15:29:46,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=774447.3333333334, ans=10.0 2024-09-25 15:30:05,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774494.0, ans=0.1 2024-09-25 15:30:18,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.80 vs. limit=10.0 2024-09-25 15:30:29,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.66 vs. limit=15.0 2024-09-25 15:30:37,780 INFO [train.py:1198] (3/4) Epoch 43, batch 2350, loss[loss=0.1955, ctc_loss=0.1243, cr_loss=0.3558, over 17016.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1208, cr_loss=0.3373, over 3375865.02 frames. ], batch size: 51, lr: 2.76e-03, grad_scale: 16.0 2024-09-25 15:30:39,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=774587.3333333334, ans=0.0 2024-09-25 15:30:41,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=774587.3333333334, ans=10.0 2024-09-25 15:30:46,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=774587.3333333334, ans=0.07 2024-09-25 15:31:09,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774634.0, ans=0.1 2024-09-25 15:32:00,370 INFO [train.py:1198] (3/4) Epoch 43, batch 2400, loss[loss=0.2317, ctc_loss=0.1523, cr_loss=0.3968, over 16813.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1214, cr_loss=0.3384, over 3363291.31 frames. ], batch size: 58, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:32:18,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=774867.3333333334, ans=12.0 2024-09-25 15:32:23,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=774867.3333333334, ans=0.0 2024-09-25 15:32:27,693 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.292e+02 1.363e+02 1.432e+02 2.291e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-25 15:32:41,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=774914.0, ans=0.0 2024-09-25 15:32:45,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=774914.0, ans=0.125 2024-09-25 15:32:55,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774960.6666666666, ans=0.1 2024-09-25 15:33:01,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=774960.6666666666, ans=0.2 2024-09-25 15:33:14,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=775007.3333333334, ans=0.2 2024-09-25 15:33:16,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775007.3333333334, ans=0.1 2024-09-25 15:33:20,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2024-09-25 15:33:20,862 INFO [train.py:1198] (3/4) Epoch 43, batch 2450, loss[loss=0.2169, ctc_loss=0.1462, cr_loss=0.3535, over 12237.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1212, cr_loss=0.3378, over 3358656.41 frames. ], batch size: 123, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:33:25,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.31 vs. limit=22.5 2024-09-25 15:33:35,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=775100.6666666666, ans=0.1 2024-09-25 15:34:19,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=775194.0, ans=0.0 2024-09-25 15:34:30,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2024-09-25 15:34:45,603 INFO [train.py:1198] (3/4) Epoch 43, batch 2500, loss[loss=0.2234, ctc_loss=0.1463, cr_loss=0.3859, over 17009.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1216, cr_loss=0.339, over 3358977.63 frames. ], batch size: 53, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:34:52,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=775287.3333333334, ans=0.0 2024-09-25 15:34:53,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.92 vs. limit=15.0 2024-09-25 15:35:07,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=775334.0, ans=0.125 2024-09-25 15:35:10,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=775334.0, ans=10.0 2024-09-25 15:35:13,024 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 1.284e+02 1.374e+02 1.483e+02 2.994e+02, threshold=2.747e+02, percent-clipped=1.0 2024-09-25 15:35:14,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775334.0, ans=0.1 2024-09-25 15:35:23,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=775380.6666666666, ans=0.07 2024-09-25 15:35:23,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=775380.6666666666, ans=0.125 2024-09-25 15:35:40,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=775427.3333333334, ans=0.2 2024-09-25 15:35:41,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=775427.3333333334, ans=0.125 2024-09-25 15:35:51,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.18 vs. limit=10.0 2024-09-25 15:35:54,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=775474.0, ans=0.125 2024-09-25 15:36:02,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=775474.0, ans=0.125 2024-09-25 15:36:11,654 INFO [train.py:1198] (3/4) Epoch 43, batch 2550, loss[loss=0.1743, ctc_loss=0.1105, cr_loss=0.3189, over 16952.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.122, cr_loss=0.3407, over 3356532.25 frames. ], batch size: 42, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:36:32,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=775567.3333333334, ans=0.125 2024-09-25 15:36:39,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=775567.3333333334, ans=0.125 2024-09-25 15:36:39,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=775567.3333333334, ans=0.125 2024-09-25 15:36:52,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2024-09-25 15:37:10,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=775660.6666666666, ans=0.0 2024-09-25 15:37:23,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.05 vs. limit=6.0 2024-09-25 15:37:32,118 INFO [train.py:1198] (3/4) Epoch 43, batch 2600, loss[loss=0.1817, ctc_loss=0.115, cr_loss=0.3335, over 17305.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1216, cr_loss=0.3398, over 3343946.08 frames. ], batch size: 49, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:37:54,577 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 15:37:58,960 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.271e+02 1.387e+02 1.465e+02 2.023e+02, threshold=2.774e+02, percent-clipped=0.0 2024-09-25 15:38:17,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=775847.3333333334, ans=0.0 2024-09-25 15:38:30,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2024-09-25 15:38:51,971 INFO [train.py:1198] (3/4) Epoch 43, batch 2650, loss[loss=0.1671, ctc_loss=0.1076, cr_loss=0.2978, over 16341.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1225, cr_loss=0.3406, over 3336425.82 frames. ], batch size: 36, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:39:27,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=776080.6666666666, ans=0.125 2024-09-25 15:39:43,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=776127.3333333334, ans=0.2 2024-09-25 15:40:01,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=776174.0, ans=0.125 2024-09-25 15:40:12,318 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 15:40:16,628 INFO [train.py:1198] (3/4) Epoch 43, batch 2700, loss[loss=0.1976, ctc_loss=0.1274, cr_loss=0.3509, over 17047.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1215, cr_loss=0.3393, over 3352117.36 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:40:16,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=776220.6666666666, ans=0.125 2024-09-25 15:40:17,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=776220.6666666666, ans=0.025 2024-09-25 15:40:21,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=776220.6666666666, ans=0.0 2024-09-25 15:40:32,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=776220.6666666666, ans=0.125 2024-09-25 15:40:47,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=776267.3333333334, ans=0.035 2024-09-25 15:40:49,149 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.311e+02 1.386e+02 1.489e+02 1.705e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 15:40:58,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=776314.0, ans=0.125 2024-09-25 15:41:06,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=776314.0, ans=0.04949747468305833 2024-09-25 15:41:13,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=776360.6666666666, ans=0.125 2024-09-25 15:41:41,507 INFO [train.py:1198] (3/4) Epoch 43, batch 2750, loss[loss=0.2224, ctc_loss=0.1462, cr_loss=0.3806, over 17005.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1214, cr_loss=0.3389, over 3356788.69 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:41:46,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=776454.0, ans=0.125 2024-09-25 15:41:52,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-25 15:42:02,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=776500.6666666666, ans=0.0 2024-09-25 15:42:05,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=776500.6666666666, ans=0.1 2024-09-25 15:42:29,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=22.5 2024-09-25 15:42:36,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=776594.0, ans=0.04949747468305833 2024-09-25 15:42:38,382 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 15:42:39,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=776594.0, ans=0.125 2024-09-25 15:42:46,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=776640.6666666666, ans=0.0 2024-09-25 15:42:52,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=776640.6666666666, ans=0.2 2024-09-25 15:43:02,158 INFO [train.py:1198] (3/4) Epoch 43, batch 2800, loss[loss=0.1917, ctc_loss=0.1203, cr_loss=0.357, over 17312.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1217, cr_loss=0.3391, over 3349688.50 frames. ], batch size: 51, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:43:18,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=776734.0, ans=0.1 2024-09-25 15:43:27,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=776734.0, ans=0.025 2024-09-25 15:43:29,181 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.317e+02 1.406e+02 1.487e+02 1.823e+02, threshold=2.813e+02, percent-clipped=0.0 2024-09-25 15:44:27,273 INFO [train.py:1198] (3/4) Epoch 43, batch 2850, loss[loss=0.1986, ctc_loss=0.13, cr_loss=0.3428, over 14991.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1213, cr_loss=0.3385, over 3360115.87 frames. ], batch size: 89, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:44:58,458 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2024-09-25 15:45:30,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=777060.6666666666, ans=0.125 2024-09-25 15:45:32,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=777107.3333333334, ans=0.125 2024-09-25 15:45:46,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=777107.3333333334, ans=0.025 2024-09-25 15:45:52,193 INFO [train.py:1198] (3/4) Epoch 43, batch 2900, loss[loss=0.1474, ctc_loss=0.09213, cr_loss=0.2765, over 16765.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1209, cr_loss=0.3379, over 3366320.80 frames. ], batch size: 37, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:46:19,411 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.318e+02 1.398e+02 1.451e+02 2.249e+02, threshold=2.795e+02, percent-clipped=0.0 2024-09-25 15:47:12,744 INFO [train.py:1198] (3/4) Epoch 43, batch 2950, loss[loss=0.2151, ctc_loss=0.1429, cr_loss=0.3609, over 16038.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1212, cr_loss=0.338, over 3356448.15 frames. ], batch size: 74, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:47:35,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=777434.0, ans=0.125 2024-09-25 15:47:45,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=777480.6666666666, ans=0.2 2024-09-25 15:47:50,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=777480.6666666666, ans=0.125 2024-09-25 15:47:59,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=777527.3333333334, ans=0.125 2024-09-25 15:48:14,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=777527.3333333334, ans=0.0 2024-09-25 15:48:33,059 INFO [train.py:1198] (3/4) Epoch 43, batch 3000, loss[loss=0.1894, ctc_loss=0.121, cr_loss=0.3422, over 17238.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.121, cr_loss=0.3375, over 3357265.75 frames. ], batch size: 50, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:48:33,060 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 15:48:50,092 INFO [train.py:1230] (3/4) Epoch 43, validation: loss=0.03539, ctc_loss=0.03539, cr_loss=1.015e-14, over 944034.00 frames. 2024-09-25 15:48:50,093 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 15:48:50,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=777620.6666666666, ans=0.125 2024-09-25 15:49:16,946 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.046e+02 1.300e+02 1.373e+02 1.452e+02 1.992e+02, threshold=2.746e+02, percent-clipped=0.0 2024-09-25 15:50:10,666 INFO [train.py:1198] (3/4) Epoch 43, batch 3050, loss[loss=0.2004, ctc_loss=0.1312, cr_loss=0.3459, over 17026.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1211, cr_loss=0.3377, over 3357812.09 frames. ], batch size: 51, lr: 2.75e-03, grad_scale: 16.0 2024-09-25 15:50:17,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=777854.0, ans=0.125 2024-09-25 15:50:54,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=777947.3333333334, ans=0.125 2024-09-25 15:51:00,827 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 15:51:07,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.38 vs. limit=10.0 2024-09-25 15:51:08,918 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.41 vs. limit=10.0 2024-09-25 15:51:16,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=778040.6666666666, ans=0.025 2024-09-25 15:51:28,336 INFO [train.py:1198] (3/4) Epoch 43, batch 3100, loss[loss=0.1443, ctc_loss=0.08985, cr_loss=0.2723, over 16941.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1203, cr_loss=0.3363, over 3361941.14 frames. ], batch size: 42, lr: 2.75e-03, grad_scale: 16.0 2024-09-25 15:51:33,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=778087.3333333334, ans=0.0 2024-09-25 15:51:42,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=778134.0, ans=0.025 2024-09-25 15:51:44,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=778134.0, ans=0.05 2024-09-25 15:51:49,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=778134.0, ans=0.125 2024-09-25 15:51:58,668 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.269e+02 1.353e+02 1.457e+02 2.895e+02, threshold=2.707e+02, percent-clipped=1.0 2024-09-25 15:51:59,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=778134.0, ans=0.2 2024-09-25 15:52:16,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=778227.3333333334, ans=0.125 2024-09-25 15:52:24,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=778227.3333333334, ans=0.025 2024-09-25 15:52:51,553 INFO [train.py:1198] (3/4) Epoch 43, batch 3150, loss[loss=0.2044, ctc_loss=0.1333, cr_loss=0.3555, over 17099.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1205, cr_loss=0.3379, over 3360154.25 frames. ], batch size: 49, lr: 2.75e-03, grad_scale: 16.0 2024-09-25 15:52:56,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=778320.6666666666, ans=0.125 2024-09-25 15:53:23,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=778414.0, ans=0.0 2024-09-25 15:53:50,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=778460.6666666666, ans=0.125 2024-09-25 15:54:04,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=778507.3333333334, ans=0.125 2024-09-25 15:54:05,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=778507.3333333334, ans=0.125 2024-09-25 15:54:10,166 INFO [train.py:1198] (3/4) Epoch 43, batch 3200, loss[loss=0.1943, ctc_loss=0.1273, cr_loss=0.3349, over 16141.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1199, cr_loss=0.3362, over 3353968.93 frames. ], batch size: 74, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:54:16,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=778554.0, ans=0.0 2024-09-25 15:54:18,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=22.5 2024-09-25 15:54:18,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=22.5 2024-09-25 15:54:26,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=778600.6666666666, ans=0.0 2024-09-25 15:54:27,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=778600.6666666666, ans=0.025 2024-09-25 15:54:32,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.74 vs. limit=6.0 2024-09-25 15:54:35,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=778600.6666666666, ans=0.0 2024-09-25 15:54:35,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=778600.6666666666, ans=0.0 2024-09-25 15:54:38,168 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.301e+02 1.369e+02 1.499e+02 2.028e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-25 15:54:38,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=778600.6666666666, ans=0.0 2024-09-25 15:55:27,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=778787.3333333334, ans=0.05 2024-09-25 15:55:28,973 INFO [train.py:1198] (3/4) Epoch 43, batch 3250, loss[loss=0.213, ctc_loss=0.1379, cr_loss=0.3757, over 16586.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1206, cr_loss=0.3374, over 3344258.49 frames. ], batch size: 66, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:56:00,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=778880.6666666666, ans=0.125 2024-09-25 15:56:17,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=778927.3333333334, ans=0.1 2024-09-25 15:56:34,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=778974.0, ans=0.125 2024-09-25 15:56:46,769 INFO [train.py:1198] (3/4) Epoch 43, batch 3300, loss[loss=0.1709, ctc_loss=0.1072, cr_loss=0.3183, over 17208.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1204, cr_loss=0.3369, over 3345157.08 frames. ], batch size: 41, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:56:51,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=779020.6666666666, ans=0.125 2024-09-25 15:57:02,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=779067.3333333334, ans=0.125 2024-09-25 15:57:14,795 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.314e+02 1.383e+02 1.471e+02 2.059e+02, threshold=2.766e+02, percent-clipped=0.0 2024-09-25 15:57:33,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=779160.6666666666, ans=0.0 2024-09-25 15:57:36,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=779160.6666666666, ans=0.2 2024-09-25 15:57:43,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=779160.6666666666, ans=0.0 2024-09-25 15:57:49,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=779207.3333333334, ans=0.0 2024-09-25 15:58:04,607 INFO [train.py:1198] (3/4) Epoch 43, batch 3350, loss[loss=0.1762, ctc_loss=0.1123, cr_loss=0.3193, over 17093.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1217, cr_loss=0.3393, over 3332875.98 frames. ], batch size: 49, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:58:29,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=779300.6666666666, ans=0.0 2024-09-25 15:58:38,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=779347.3333333334, ans=0.125 2024-09-25 15:58:58,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=779394.0, ans=0.0 2024-09-25 15:58:58,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=779394.0, ans=0.1 2024-09-25 15:59:18,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=779440.6666666666, ans=0.125 2024-09-25 15:59:21,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=779487.3333333334, ans=0.125 2024-09-25 15:59:22,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779487.3333333334, ans=0.1 2024-09-25 15:59:23,221 INFO [train.py:1198] (3/4) Epoch 43, batch 3400, loss[loss=0.1867, ctc_loss=0.1184, cr_loss=0.3417, over 17100.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1211, cr_loss=0.3381, over 3333365.44 frames. ], batch size: 49, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:59:26,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=779487.3333333334, ans=0.125 2024-09-25 15:59:38,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=779487.3333333334, ans=0.125 2024-09-25 15:59:44,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=779534.0, ans=0.2 2024-09-25 15:59:55,337 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.291e+02 1.360e+02 1.462e+02 4.308e+02, threshold=2.720e+02, percent-clipped=1.0 2024-09-25 16:00:12,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=779627.3333333334, ans=0.125 2024-09-25 16:00:20,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779627.3333333334, ans=0.1 2024-09-25 16:00:36,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=779674.0, ans=0.0 2024-09-25 16:00:43,970 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.85 vs. limit=15.0 2024-09-25 16:00:45,027 INFO [train.py:1198] (3/4) Epoch 43, batch 3450, loss[loss=0.1884, ctc_loss=0.1238, cr_loss=0.3228, over 17029.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1211, cr_loss=0.3379, over 3345966.44 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 16:00:54,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=779720.6666666666, ans=0.125 2024-09-25 16:01:36,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=779860.6666666666, ans=0.125 2024-09-25 16:01:38,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=779860.6666666666, ans=0.0 2024-09-25 16:02:00,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=779907.3333333334, ans=0.2 2024-09-25 16:02:03,524 INFO [train.py:1198] (3/4) Epoch 43, batch 3500, loss[loss=0.1871, ctc_loss=0.1216, cr_loss=0.3271, over 17033.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1207, cr_loss=0.3369, over 3351421.59 frames. ], batch size: 39, lr: 2.75e-03, grad_scale: 16.0 2024-09-25 16:02:29,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=780000.6666666666, ans=0.0 2024-09-25 16:02:32,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=780000.6666666666, ans=0.025 2024-09-25 16:02:35,360 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.311e+02 1.389e+02 1.472e+02 1.940e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-25 16:02:38,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=780047.3333333334, ans=0.125 2024-09-25 16:02:41,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=780047.3333333334, ans=0.125 2024-09-25 16:02:58,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=780094.0, ans=0.2 2024-09-25 16:03:13,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=780140.6666666666, ans=0.2 2024-09-25 16:03:26,045 INFO [train.py:1198] (3/4) Epoch 43, batch 3550, loss[loss=0.2057, ctc_loss=0.1349, cr_loss=0.3541, over 17037.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1212, cr_loss=0.3376, over 3339325.63 frames. ], batch size: 51, lr: 2.75e-03, grad_scale: 16.0 2024-09-25 16:03:34,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.34 vs. limit=6.0 2024-09-25 16:03:45,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=780234.0, ans=0.025 2024-09-25 16:04:01,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=780280.6666666666, ans=0.125 2024-09-25 16:04:42,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=780374.0, ans=0.0 2024-09-25 16:04:43,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=780420.6666666666, ans=0.0 2024-09-25 16:04:44,774 INFO [train.py:1198] (3/4) Epoch 43, batch 3600, loss[loss=0.1825, ctc_loss=0.1178, cr_loss=0.3235, over 16168.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1217, cr_loss=0.3383, over 3336708.47 frames. ], batch size: 74, lr: 2.74e-03, grad_scale: 32.0 2024-09-25 16:04:45,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-09-25 16:04:51,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=780420.6666666666, ans=0.125 2024-09-25 16:05:14,402 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.305e+02 1.407e+02 1.507e+02 1.948e+02, threshold=2.813e+02, percent-clipped=0.0 2024-09-25 16:05:28,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=780514.0, ans=0.1 2024-09-25 16:05:41,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=780560.6666666666, ans=0.0 2024-09-25 16:05:41,662 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-09-25 16:05:42,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=780560.6666666666, ans=0.125 2024-09-25 16:05:47,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780607.3333333334, ans=0.1 2024-09-25 16:05:53,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=780607.3333333334, ans=0.125 2024-09-25 16:05:55,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780607.3333333334, ans=0.1 2024-09-25 16:06:00,456 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:06:03,265 INFO [train.py:1198] (3/4) Epoch 43, batch 3650, loss[loss=0.2128, ctc_loss=0.1408, cr_loss=0.3599, over 14856.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1216, cr_loss=0.3386, over 3343946.97 frames. ], batch size: 89, lr: 2.74e-03, grad_scale: 32.0 2024-09-25 16:06:03,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=780654.0, ans=0.125 2024-09-25 16:06:52,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-09-25 16:06:58,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=780794.0, ans=0.0 2024-09-25 16:06:59,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2024-09-25 16:07:09,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=780840.6666666666, ans=0.09899494936611666 2024-09-25 16:07:21,990 INFO [train.py:1198] (3/4) Epoch 43, batch 3700, loss[loss=0.211, ctc_loss=0.1427, cr_loss=0.3413, over 14973.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.121, cr_loss=0.3371, over 3352420.17 frames. ], batch size: 89, lr: 2.74e-03, grad_scale: 32.0 2024-09-25 16:07:23,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=780887.3333333334, ans=0.0 2024-09-25 16:07:29,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2024-09-25 16:07:30,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780887.3333333334, ans=0.1 2024-09-25 16:07:46,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=780934.0, ans=0.125 2024-09-25 16:07:52,637 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.295e+02 1.384e+02 1.500e+02 2.354e+02, threshold=2.769e+02, percent-clipped=0.0 2024-09-25 16:08:41,140 INFO [train.py:1198] (3/4) Epoch 43, batch 3750, loss[loss=0.2262, ctc_loss=0.1509, cr_loss=0.3766, over 11900.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1215, cr_loss=0.3375, over 3331537.96 frames. ], batch size: 123, lr: 2.74e-03, grad_scale: 16.0 2024-09-25 16:08:42,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=781120.6666666666, ans=0.0 2024-09-25 16:08:46,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2024-09-25 16:08:49,356 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:09:01,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=781167.3333333334, ans=0.125 2024-09-25 16:09:08,165 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:09:20,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=781214.0, ans=0.125 2024-09-25 16:09:35,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=22.5 2024-09-25 16:09:36,746 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=15.0 2024-09-25 16:09:50,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=781307.3333333334, ans=0.2 2024-09-25 16:09:52,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=781307.3333333334, ans=0.0 2024-09-25 16:10:01,218 INFO [train.py:1198] (3/4) Epoch 43, batch 3800, loss[loss=0.1686, ctc_loss=0.1098, cr_loss=0.2944, over 16956.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1216, cr_loss=0.3378, over 3327114.44 frames. ], batch size: 42, lr: 2.74e-03, grad_scale: 16.0 2024-09-25 16:10:07,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=781354.0, ans=0.2 2024-09-25 16:10:09,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=781354.0, ans=0.125 2024-09-25 16:10:14,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2024-09-25 16:10:31,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=781447.3333333334, ans=0.2 2024-09-25 16:10:32,811 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.355e+02 1.451e+02 1.569e+02 1.796e+02, threshold=2.903e+02, percent-clipped=0.0 2024-09-25 16:10:57,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.80 vs. limit=10.0 2024-09-25 16:11:03,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=781540.6666666666, ans=0.1 2024-09-25 16:11:15,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=22.5 2024-09-25 16:11:17,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=781540.6666666666, ans=0.125 2024-09-25 16:11:20,436 INFO [train.py:1198] (3/4) Epoch 43, batch 3850, loss[loss=0.2401, ctc_loss=0.1605, cr_loss=0.3981, over 12629.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1244, cr_loss=0.3419, over 3278314.86 frames. ], batch size: 123, lr: 2.74e-03, grad_scale: 16.0 2024-09-25 16:11:44,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=781634.0, ans=0.0 2024-09-25 16:11:59,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=781680.6666666666, ans=0.2 2024-09-25 16:12:13,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=781727.3333333334, ans=0.2 2024-09-25 16:12:27,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2024-09-25 16:13:18,299 INFO [train.py:1198] (3/4) Epoch 44, batch 0, loss[loss=0.1804, ctc_loss=0.1138, cr_loss=0.3326, over 16940.00 frames. ], tot_loss[loss=0.1804, ctc_loss=0.1138, cr_loss=0.3326, over 16940.00 frames. ], batch size: 42, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:13:18,299 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 16:13:33,582 INFO [train.py:1230] (3/4) Epoch 44, validation: loss=0.03507, ctc_loss=0.03507, cr_loss=1.053e-14, over 944034.00 frames. 2024-09-25 16:13:33,583 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 16:13:54,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=781848.6666666666, ans=0.125 2024-09-25 16:13:54,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=781848.6666666666, ans=0.07 2024-09-25 16:13:55,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2024-09-25 16:14:03,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2024-09-25 16:14:13,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=781895.3333333334, ans=0.125 2024-09-25 16:14:14,730 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.434e+02 1.591e+02 1.721e+02 2.734e+02, threshold=3.183e+02, percent-clipped=0.0 2024-09-25 16:14:43,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=781988.6666666666, ans=0.125 2024-09-25 16:14:46,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=781988.6666666666, ans=0.1 2024-09-25 16:14:51,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=781988.6666666666, ans=0.0 2024-09-25 16:14:59,048 INFO [train.py:1198] (3/4) Epoch 44, batch 50, loss[loss=0.2121, ctc_loss=0.1383, cr_loss=0.369, over 17204.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1227, cr_loss=0.344, over 761118.62 frames. ], batch size: 55, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:15:02,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=782035.3333333334, ans=0.1 2024-09-25 16:15:09,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-25 16:15:16,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782082.0, ans=0.1 2024-09-25 16:15:21,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=782082.0, ans=0.2 2024-09-25 16:15:28,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=782082.0, ans=0.0 2024-09-25 16:15:36,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=782128.6666666666, ans=0.125 2024-09-25 16:15:39,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=782128.6666666666, ans=0.025 2024-09-25 16:15:54,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.76 vs. limit=10.0 2024-09-25 16:15:55,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=782175.3333333334, ans=0.1 2024-09-25 16:16:05,319 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=15.0 2024-09-25 16:16:18,759 INFO [train.py:1198] (3/4) Epoch 44, batch 100, loss[loss=0.1782, ctc_loss=0.1135, cr_loss=0.3236, over 17243.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1253, cr_loss=0.3478, over 1348542.84 frames. ], batch size: 44, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:16:24,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=782268.6666666666, ans=0.025 2024-09-25 16:16:35,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=782315.3333333334, ans=0.2 2024-09-25 16:16:36,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=782315.3333333334, ans=0.125 2024-09-25 16:16:49,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=782362.0, ans=0.2 2024-09-25 16:17:00,120 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.309e+02 1.396e+02 1.536e+02 2.062e+02, threshold=2.792e+02, percent-clipped=0.0 2024-09-25 16:17:00,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=782362.0, ans=0.125 2024-09-25 16:17:41,586 INFO [train.py:1198] (3/4) Epoch 44, batch 150, loss[loss=0.1584, ctc_loss=0.09702, cr_loss=0.3068, over 17099.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1215, cr_loss=0.3397, over 1788255.05 frames. ], batch size: 40, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:17:56,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.03 vs. limit=10.0 2024-09-25 16:18:12,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2024-09-25 16:18:20,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=782595.3333333334, ans=0.125 2024-09-25 16:18:45,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.13 vs. limit=10.0 2024-09-25 16:19:07,047 INFO [train.py:1198] (3/4) Epoch 44, batch 200, loss[loss=0.1982, ctc_loss=0.1259, cr_loss=0.3617, over 17296.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1208, cr_loss=0.3387, over 2141237.05 frames. ], batch size: 49, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:19:28,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=782782.0, ans=0.125 2024-09-25 16:19:28,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=782782.0, ans=0.025 2024-09-25 16:19:48,491 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.333e+02 1.408e+02 1.534e+02 2.430e+02, threshold=2.816e+02, percent-clipped=0.0 2024-09-25 16:20:18,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=782922.0, ans=0.125 2024-09-25 16:20:24,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=782922.0, ans=0.0 2024-09-25 16:20:30,832 INFO [train.py:1198] (3/4) Epoch 44, batch 250, loss[loss=0.179, ctc_loss=0.1117, cr_loss=0.3362, over 17327.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1196, cr_loss=0.3361, over 2409934.09 frames. ], batch size: 52, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:20:51,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=783015.3333333334, ans=0.125 2024-09-25 16:21:02,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=783062.0, ans=0.125 2024-09-25 16:21:30,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=783108.6666666666, ans=0.2 2024-09-25 16:21:54,066 INFO [train.py:1198] (3/4) Epoch 44, batch 300, loss[loss=0.1801, ctc_loss=0.1159, cr_loss=0.321, over 17054.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.12, cr_loss=0.3367, over 2624698.62 frames. ], batch size: 46, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:22:08,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=783248.6666666666, ans=0.2 2024-09-25 16:22:19,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=783248.6666666666, ans=0.0 2024-09-25 16:22:21,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=783248.6666666666, ans=0.125 2024-09-25 16:22:24,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=783295.3333333334, ans=0.2 2024-09-25 16:22:32,113 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.326e+02 1.428e+02 1.549e+02 2.699e+02, threshold=2.856e+02, percent-clipped=0.0 2024-09-25 16:22:47,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-09-25 16:23:00,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783388.6666666666, ans=0.1 2024-09-25 16:23:16,540 INFO [train.py:1198] (3/4) Epoch 44, batch 350, loss[loss=0.173, ctc_loss=0.1103, cr_loss=0.3134, over 17152.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1206, cr_loss=0.3372, over 2790764.24 frames. ], batch size: 41, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:23:30,011 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2024-09-25 16:23:32,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=783482.0, ans=0.125 2024-09-25 16:23:39,040 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:24:12,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=783575.3333333334, ans=0.125 2024-09-25 16:24:14,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=783575.3333333334, ans=0.0 2024-09-25 16:24:17,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=783575.3333333334, ans=0.0 2024-09-25 16:24:42,413 INFO [train.py:1198] (3/4) Epoch 44, batch 400, loss[loss=0.1905, ctc_loss=0.1248, cr_loss=0.3283, over 17327.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.121, cr_loss=0.3376, over 2915367.60 frames. ], batch size: 51, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:24:57,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=783715.3333333334, ans=0.2 2024-09-25 16:24:59,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-09-25 16:25:20,975 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.320e+02 1.420e+02 1.550e+02 2.069e+02, threshold=2.840e+02, percent-clipped=0.0 2024-09-25 16:25:53,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=783855.3333333334, ans=0.0 2024-09-25 16:26:02,568 INFO [train.py:1198] (3/4) Epoch 44, batch 450, loss[loss=0.172, ctc_loss=0.1076, cr_loss=0.3219, over 17103.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1219, cr_loss=0.339, over 3003864.28 frames. ], batch size: 40, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:26:08,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-09-25 16:26:10,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783902.0, ans=0.1 2024-09-25 16:26:15,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=783902.0, ans=0.125 2024-09-25 16:26:19,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=783948.6666666666, ans=0.125 2024-09-25 16:26:25,910 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2024-09-25 16:27:11,140 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:27:28,175 INFO [train.py:1198] (3/4) Epoch 44, batch 500, loss[loss=0.19, ctc_loss=0.1229, cr_loss=0.335, over 17216.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1217, cr_loss=0.3385, over 3076497.37 frames. ], batch size: 50, lr: 2.71e-03, grad_scale: 16.0 2024-09-25 16:27:37,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=784135.3333333334, ans=0.0 2024-09-25 16:28:10,663 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.312e+02 1.366e+02 1.455e+02 1.775e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-25 16:28:50,234 INFO [train.py:1198] (3/4) Epoch 44, batch 550, loss[loss=0.179, ctc_loss=0.1144, cr_loss=0.3228, over 17305.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.122, cr_loss=0.3387, over 3139335.50 frames. ], batch size: 46, lr: 2.71e-03, grad_scale: 16.0 2024-09-25 16:28:58,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=784368.6666666666, ans=0.0 2024-09-25 16:29:08,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=784415.3333333334, ans=0.0 2024-09-25 16:29:18,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2024-09-25 16:29:20,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=784415.3333333334, ans=0.1 2024-09-25 16:29:25,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=784462.0, ans=0.125 2024-09-25 16:29:26,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=784462.0, ans=0.125 2024-09-25 16:29:29,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=784462.0, ans=0.015 2024-09-25 16:29:45,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=784508.6666666666, ans=0.125 2024-09-25 16:29:51,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=784508.6666666666, ans=0.0 2024-09-25 16:30:02,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=784555.3333333334, ans=0.025 2024-09-25 16:30:10,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=784555.3333333334, ans=0.125 2024-09-25 16:30:15,132 INFO [train.py:1198] (3/4) Epoch 44, batch 600, loss[loss=0.1648, ctc_loss=0.1036, cr_loss=0.3061, over 17072.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1222, cr_loss=0.3404, over 3198501.75 frames. ], batch size: 40, lr: 2.71e-03, grad_scale: 16.0 2024-09-25 16:30:26,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=784602.0, ans=0.0 2024-09-25 16:30:55,328 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.308e+02 1.386e+02 1.485e+02 2.110e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 16:31:10,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=784742.0, ans=0.125 2024-09-25 16:31:14,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=22.5 2024-09-25 16:31:17,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2024-09-25 16:31:35,713 INFO [train.py:1198] (3/4) Epoch 44, batch 650, loss[loss=0.1885, ctc_loss=0.1187, cr_loss=0.3491, over 17104.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1208, cr_loss=0.3382, over 3240285.63 frames. ], batch size: 49, lr: 2.71e-03, grad_scale: 16.0 2024-09-25 16:31:43,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=784835.3333333334, ans=0.125 2024-09-25 16:31:57,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=784882.0, ans=0.125 2024-09-25 16:32:36,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=784975.3333333334, ans=0.2 2024-09-25 16:32:48,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=785022.0, ans=0.125 2024-09-25 16:32:51,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=785022.0, ans=0.2 2024-09-25 16:32:53,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.43 vs. limit=15.0 2024-09-25 16:32:59,116 INFO [train.py:1198] (3/4) Epoch 44, batch 700, loss[loss=0.2201, ctc_loss=0.1401, cr_loss=0.4, over 17028.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.121, cr_loss=0.3387, over 3273926.66 frames. ], batch size: 51, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:33:00,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=785068.6666666666, ans=0.05 2024-09-25 16:33:02,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=785068.6666666666, ans=0.025 2024-09-25 16:33:29,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=785115.3333333334, ans=0.125 2024-09-25 16:33:41,733 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.292e+02 1.388e+02 1.458e+02 2.267e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-25 16:33:59,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2024-09-25 16:34:06,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=785255.3333333334, ans=10.0 2024-09-25 16:34:19,513 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:34:24,044 INFO [train.py:1198] (3/4) Epoch 44, batch 750, loss[loss=0.1696, ctc_loss=0.1074, cr_loss=0.3113, over 16315.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1215, cr_loss=0.3398, over 3294690.71 frames. ], batch size: 36, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:34:26,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=22.5 2024-09-25 16:35:19,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=785442.0, ans=0.025 2024-09-25 16:35:31,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=785488.6666666666, ans=0.2 2024-09-25 16:35:37,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=785488.6666666666, ans=0.125 2024-09-25 16:35:46,668 INFO [train.py:1198] (3/4) Epoch 44, batch 800, loss[loss=0.1665, ctc_loss=0.1062, cr_loss=0.3016, over 17103.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1215, cr_loss=0.3398, over 3315439.56 frames. ], batch size: 40, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:35:53,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=785535.3333333334, ans=0.1 2024-09-25 16:36:02,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=8.74 vs. limit=22.5 2024-09-25 16:36:10,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=785582.0, ans=0.0 2024-09-25 16:36:15,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=785582.0, ans=0.0 2024-09-25 16:36:26,438 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.304e+02 1.360e+02 1.506e+02 2.198e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-25 16:36:26,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=785628.6666666666, ans=0.125 2024-09-25 16:36:59,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=785722.0, ans=0.025 2024-09-25 16:37:09,080 INFO [train.py:1198] (3/4) Epoch 44, batch 850, loss[loss=0.1961, ctc_loss=0.125, cr_loss=0.3558, over 17239.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1217, cr_loss=0.3403, over 3329954.86 frames. ], batch size: 50, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:37:09,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=785768.6666666666, ans=0.1 2024-09-25 16:37:28,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=785815.3333333334, ans=0.2 2024-09-25 16:37:48,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=785862.0, ans=0.035 2024-09-25 16:37:48,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=785862.0, ans=0.2 2024-09-25 16:37:56,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=785908.6666666666, ans=10.0 2024-09-25 16:38:18,546 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:38:32,619 INFO [train.py:1198] (3/4) Epoch 44, batch 900, loss[loss=0.1832, ctc_loss=0.1171, cr_loss=0.3306, over 17295.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1212, cr_loss=0.3388, over 3342114.84 frames. ], batch size: 49, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:38:37,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=786002.0, ans=0.125 2024-09-25 16:38:53,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=786048.6666666666, ans=0.125 2024-09-25 16:39:15,052 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.314e+02 1.378e+02 1.470e+02 1.852e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 16:39:28,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=786142.0, ans=0.125 2024-09-25 16:39:36,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-09-25 16:39:58,025 INFO [train.py:1198] (3/4) Epoch 44, batch 950, loss[loss=0.2283, ctc_loss=0.1548, cr_loss=0.3672, over 11816.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1202, cr_loss=0.3365, over 3342189.57 frames. ], batch size: 123, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:39:59,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=786235.3333333334, ans=0.125 2024-09-25 16:40:00,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=786235.3333333334, ans=0.2 2024-09-25 16:40:01,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=786235.3333333334, ans=0.0 2024-09-25 16:40:04,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=786235.3333333334, ans=0.125 2024-09-25 16:40:10,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=786235.3333333334, ans=0.1 2024-09-25 16:40:23,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=786282.0, ans=0.0 2024-09-25 16:40:23,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=786282.0, ans=0.1 2024-09-25 16:40:46,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=786375.3333333334, ans=0.04949747468305833 2024-09-25 16:41:07,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=786422.0, ans=0.125 2024-09-25 16:41:12,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2024-09-25 16:41:18,078 INFO [train.py:1198] (3/4) Epoch 44, batch 1000, loss[loss=0.2265, ctc_loss=0.1499, cr_loss=0.3831, over 16963.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1204, cr_loss=0.3369, over 3353172.99 frames. ], batch size: 58, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:41:35,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=786515.3333333334, ans=0.0 2024-09-25 16:41:37,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=786515.3333333334, ans=0.125 2024-09-25 16:41:50,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=786562.0, ans=0.2 2024-09-25 16:41:59,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=786562.0, ans=0.025 2024-09-25 16:42:02,020 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.298e+02 1.360e+02 1.466e+02 2.434e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-25 16:42:08,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=786608.6666666666, ans=0.0 2024-09-25 16:42:29,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.73 vs. limit=6.0 2024-09-25 16:42:32,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=786655.3333333334, ans=0.0 2024-09-25 16:42:40,316 INFO [train.py:1198] (3/4) Epoch 44, batch 1050, loss[loss=0.2042, ctc_loss=0.1313, cr_loss=0.3645, over 17028.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1212, cr_loss=0.3383, over 3352330.29 frames. ], batch size: 56, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:42:46,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=786702.0, ans=0.125 2024-09-25 16:43:07,916 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=15.0 2024-09-25 16:43:14,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=786795.3333333334, ans=0.125 2024-09-25 16:44:02,233 INFO [train.py:1198] (3/4) Epoch 44, batch 1100, loss[loss=0.1803, ctc_loss=0.1161, cr_loss=0.3209, over 17018.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.121, cr_loss=0.338, over 3351963.92 frames. ], batch size: 44, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:44:06,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=786935.3333333334, ans=0.2 2024-09-25 16:44:08,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=786935.3333333334, ans=0.125 2024-09-25 16:44:37,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=787028.6666666666, ans=0.2 2024-09-25 16:44:40,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=787028.6666666666, ans=0.2 2024-09-25 16:44:42,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=787028.6666666666, ans=0.125 2024-09-25 16:44:48,723 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.176e+02 1.281e+02 1.343e+02 1.422e+02 3.447e+02, threshold=2.685e+02, percent-clipped=1.0 2024-09-25 16:45:09,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=787122.0, ans=0.95 2024-09-25 16:45:19,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=787122.0, ans=0.125 2024-09-25 16:45:27,409 INFO [train.py:1198] (3/4) Epoch 44, batch 1150, loss[loss=0.1598, ctc_loss=0.0991, cr_loss=0.3033, over 17297.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1209, cr_loss=0.3379, over 3354036.90 frames. ], batch size: 46, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:45:59,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=787262.0, ans=0.125 2024-09-25 16:46:14,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=787308.6666666666, ans=0.1 2024-09-25 16:46:33,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=787355.3333333334, ans=0.125 2024-09-25 16:46:47,231 INFO [train.py:1198] (3/4) Epoch 44, batch 1200, loss[loss=0.232, ctc_loss=0.1514, cr_loss=0.4032, over 17001.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1215, cr_loss=0.3394, over 3363494.42 frames. ], batch size: 53, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:46:47,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=787402.0, ans=0.0 2024-09-25 16:46:53,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=787402.0, ans=0.2 2024-09-25 16:47:13,722 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:47:25,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=787495.3333333334, ans=0.1 2024-09-25 16:47:31,122 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.306e+02 1.377e+02 1.476e+02 2.006e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-25 16:47:39,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=787542.0, ans=0.125 2024-09-25 16:47:50,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=787542.0, ans=0.025 2024-09-25 16:47:52,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=787588.6666666666, ans=0.04949747468305833 2024-09-25 16:47:52,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.29 vs. limit=15.0 2024-09-25 16:48:04,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=787588.6666666666, ans=0.125 2024-09-25 16:48:06,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=787588.6666666666, ans=0.125 2024-09-25 16:48:11,735 INFO [train.py:1198] (3/4) Epoch 44, batch 1250, loss[loss=0.2006, ctc_loss=0.133, cr_loss=0.338, over 17089.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1216, cr_loss=0.3395, over 3355333.70 frames. ], batch size: 43, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:48:14,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.05 vs. limit=10.0 2024-09-25 16:48:15,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=787635.3333333334, ans=0.0 2024-09-25 16:48:15,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=787635.3333333334, ans=0.125 2024-09-25 16:48:23,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=787635.3333333334, ans=0.0 2024-09-25 16:48:29,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=787682.0, ans=10.0 2024-09-25 16:48:29,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=787682.0, ans=0.0 2024-09-25 16:48:42,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=787728.6666666666, ans=0.125 2024-09-25 16:48:57,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=787728.6666666666, ans=0.0 2024-09-25 16:49:06,500 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:49:14,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=787775.3333333334, ans=0.1 2024-09-25 16:49:14,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=787775.3333333334, ans=0.1 2024-09-25 16:49:19,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=787822.0, ans=0.125 2024-09-25 16:49:24,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=22.5 2024-09-25 16:49:37,343 INFO [train.py:1198] (3/4) Epoch 44, batch 1300, loss[loss=0.1887, ctc_loss=0.1215, cr_loss=0.3362, over 17016.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1216, cr_loss=0.3397, over 3354945.97 frames. ], batch size: 51, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:49:50,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=787868.6666666666, ans=0.2 2024-09-25 16:50:06,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=787915.3333333334, ans=0.125 2024-09-25 16:50:17,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=787962.0, ans=0.1 2024-09-25 16:50:18,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.78 vs. limit=10.0 2024-09-25 16:50:18,914 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.318e+02 1.377e+02 1.473e+02 1.934e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-25 16:50:23,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=788008.6666666666, ans=0.125 2024-09-25 16:50:44,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2024-09-25 16:50:51,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=788055.3333333334, ans=0.07 2024-09-25 16:50:57,004 INFO [train.py:1198] (3/4) Epoch 44, batch 1350, loss[loss=0.1575, ctc_loss=0.09815, cr_loss=0.2969, over 17249.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1215, cr_loss=0.3395, over 3355382.57 frames. ], batch size: 42, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:51:18,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=788148.6666666666, ans=0.0 2024-09-25 16:51:21,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=788148.6666666666, ans=0.125 2024-09-25 16:51:21,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=788148.6666666666, ans=0.125 2024-09-25 16:51:24,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=788148.6666666666, ans=0.125 2024-09-25 16:51:27,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=788195.3333333334, ans=0.0 2024-09-25 16:51:38,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=788195.3333333334, ans=0.0 2024-09-25 16:51:56,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.13 vs. limit=10.0 2024-09-25 16:52:16,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2024-09-25 16:52:18,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2024-09-25 16:52:19,421 INFO [train.py:1198] (3/4) Epoch 44, batch 1400, loss[loss=0.2115, ctc_loss=0.1399, cr_loss=0.3581, over 17346.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1213, cr_loss=0.3387, over 3366119.84 frames. ], batch size: 48, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:52:19,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=788335.3333333334, ans=0.0 2024-09-25 16:52:31,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=788335.3333333334, ans=0.125 2024-09-25 16:52:37,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=788382.0, ans=0.125 2024-09-25 16:52:40,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=22.5 2024-09-25 16:53:01,171 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.288e+02 1.388e+02 1.492e+02 2.105e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-25 16:53:28,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=788522.0, ans=0.125 2024-09-25 16:53:34,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=788522.0, ans=0.125 2024-09-25 16:53:42,247 INFO [train.py:1198] (3/4) Epoch 44, batch 1450, loss[loss=0.2112, ctc_loss=0.1379, cr_loss=0.3664, over 17100.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1214, cr_loss=0.3391, over 3363107.03 frames. ], batch size: 49, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:54:42,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=788708.6666666666, ans=0.1 2024-09-25 16:54:55,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=788755.3333333334, ans=0.0 2024-09-25 16:55:07,626 INFO [train.py:1198] (3/4) Epoch 44, batch 1500, loss[loss=0.1417, ctc_loss=0.08804, cr_loss=0.2685, over 17053.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1204, cr_loss=0.3366, over 3367000.67 frames. ], batch size: 39, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:55:08,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=788802.0, ans=0.0 2024-09-25 16:55:11,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=788802.0, ans=10.0 2024-09-25 16:55:15,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=788802.0, ans=0.125 2024-09-25 16:55:36,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=788848.6666666666, ans=0.025 2024-09-25 16:55:42,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.85 vs. limit=10.0 2024-09-25 16:55:50,910 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.297e+02 1.379e+02 1.448e+02 1.999e+02, threshold=2.757e+02, percent-clipped=0.0 2024-09-25 16:55:51,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=788895.3333333334, ans=0.125 2024-09-25 16:56:08,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=788942.0, ans=0.2 2024-09-25 16:56:13,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=788988.6666666666, ans=0.0 2024-09-25 16:56:27,696 INFO [train.py:1198] (3/4) Epoch 44, batch 1550, loss[loss=0.1504, ctc_loss=0.0935, cr_loss=0.2843, over 17065.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1191, cr_loss=0.3341, over 3377526.11 frames. ], batch size: 39, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:56:31,264 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:56:34,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=789035.3333333334, ans=0.0 2024-09-25 16:56:41,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.66 vs. limit=10.0 2024-09-25 16:57:20,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=789175.3333333334, ans=10.0 2024-09-25 16:57:42,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=8.0 2024-09-25 16:57:49,475 INFO [train.py:1198] (3/4) Epoch 44, batch 1600, loss[loss=0.1885, ctc_loss=0.1208, cr_loss=0.3387, over 17020.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1188, cr_loss=0.3337, over 3374116.23 frames. ], batch size: 44, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:58:03,298 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.31 vs. limit=22.5 2024-09-25 16:58:34,988 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.295e+02 1.391e+02 1.482e+02 2.401e+02, threshold=2.782e+02, percent-clipped=0.0 2024-09-25 16:58:54,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=789455.3333333334, ans=0.0 2024-09-25 16:59:05,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=12.0 2024-09-25 16:59:12,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=789502.0, ans=0.125 2024-09-25 16:59:14,307 INFO [train.py:1198] (3/4) Epoch 44, batch 1650, loss[loss=0.1626, ctc_loss=0.1017, cr_loss=0.3044, over 16948.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.119, cr_loss=0.3342, over 3380779.95 frames. ], batch size: 42, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:59:29,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=789502.0, ans=0.125 2024-09-25 16:59:39,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=789548.6666666666, ans=0.025 2024-09-25 16:59:42,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=789548.6666666666, ans=0.125 2024-09-25 17:00:06,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=789642.0, ans=0.2 2024-09-25 17:00:07,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.28 vs. limit=6.0 2024-09-25 17:00:36,857 INFO [train.py:1198] (3/4) Epoch 44, batch 1700, loss[loss=0.1892, ctc_loss=0.1218, cr_loss=0.3372, over 17214.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1189, cr_loss=0.3343, over 3384058.37 frames. ], batch size: 55, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 17:01:12,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=789828.6666666666, ans=0.0 2024-09-25 17:01:19,267 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2024-09-25 17:01:20,179 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.198e+02 1.326e+02 1.402e+02 1.495e+02 1.823e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-25 17:01:34,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=789875.3333333334, ans=0.0 2024-09-25 17:01:50,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=789922.0, ans=0.1 2024-09-25 17:01:50,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=789922.0, ans=0.125 2024-09-25 17:01:55,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=789922.0, ans=0.025 2024-09-25 17:01:59,535 INFO [train.py:1198] (3/4) Epoch 44, batch 1750, loss[loss=0.1346, ctc_loss=0.08241, cr_loss=0.261, over 17065.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1199, cr_loss=0.3361, over 3373178.21 frames. ], batch size: 39, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 17:02:08,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=789968.6666666666, ans=0.125 2024-09-25 17:02:31,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=790062.0, ans=0.5 2024-09-25 17:02:39,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=790062.0, ans=0.0 2024-09-25 17:02:50,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.40 vs. limit=22.5 2024-09-25 17:02:52,554 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 17:02:59,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=22.5 2024-09-25 17:03:22,077 INFO [train.py:1198] (3/4) Epoch 44, batch 1800, loss[loss=0.2061, ctc_loss=0.1329, cr_loss=0.3662, over 16993.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1203, cr_loss=0.3368, over 3376775.73 frames. ], batch size: 53, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 17:03:25,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=790202.0, ans=0.0 2024-09-25 17:03:27,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.44 vs. limit=12.0 2024-09-25 17:03:32,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=22.5 2024-09-25 17:03:48,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2024-09-25 17:04:08,009 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.325e+02 1.389e+02 1.497e+02 2.037e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-25 17:04:11,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=790342.0, ans=0.0 2024-09-25 17:04:12,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=790342.0, ans=0.125 2024-09-25 17:04:31,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=790388.6666666666, ans=0.1 2024-09-25 17:04:33,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=790388.6666666666, ans=0.0 2024-09-25 17:04:36,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=790388.6666666666, ans=0.125 2024-09-25 17:04:44,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=790388.6666666666, ans=0.0 2024-09-25 17:04:47,704 INFO [train.py:1198] (3/4) Epoch 44, batch 1850, loss[loss=0.1665, ctc_loss=0.104, cr_loss=0.3124, over 17034.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1197, cr_loss=0.3358, over 3376334.11 frames. ], batch size: 39, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 17:04:54,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=790435.3333333334, ans=0.125 2024-09-25 17:05:13,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=790482.0, ans=0.125 2024-09-25 17:05:28,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=790528.6666666666, ans=0.09899494936611666 2024-09-25 17:06:08,121 INFO [train.py:1198] (3/4) Epoch 44, batch 1900, loss[loss=0.2011, ctc_loss=0.13, cr_loss=0.3556, over 17308.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1202, cr_loss=0.3371, over 3367066.75 frames. ], batch size: 51, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 17:06:12,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=22.5 2024-09-25 17:06:13,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=790668.6666666666, ans=0.125 2024-09-25 17:06:20,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2024-09-25 17:06:26,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=790715.3333333334, ans=0.0 2024-09-25 17:06:52,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=790762.0, ans=0.2 2024-09-25 17:06:54,117 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.292e+02 1.379e+02 1.443e+02 2.422e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-25 17:07:05,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=790808.6666666666, ans=0.1 2024-09-25 17:07:07,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=790808.6666666666, ans=0.0 2024-09-25 17:07:31,340 INFO [train.py:1198] (3/4) Epoch 44, batch 1950, loss[loss=0.1898, ctc_loss=0.1229, cr_loss=0.3342, over 16747.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1207, cr_loss=0.338, over 3352555.18 frames. ], batch size: 61, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:07:59,147 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=12.0 2024-09-25 17:08:22,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=791042.0, ans=0.1 2024-09-25 17:08:24,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=791042.0, ans=0.125 2024-09-25 17:08:42,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=791088.6666666666, ans=0.0 2024-09-25 17:08:44,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=791088.6666666666, ans=0.0 2024-09-25 17:08:46,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=791088.6666666666, ans=0.125 2024-09-25 17:08:46,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=791088.6666666666, ans=0.125 2024-09-25 17:08:56,405 INFO [train.py:1198] (3/4) Epoch 44, batch 2000, loss[loss=0.2175, ctc_loss=0.1486, cr_loss=0.3442, over 11857.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1204, cr_loss=0.3376, over 3358459.60 frames. ], batch size: 123, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:09:30,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=791228.6666666666, ans=0.1 2024-09-25 17:09:34,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=12.0 2024-09-25 17:09:42,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.69 vs. limit=15.0 2024-09-25 17:09:43,344 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.332e+02 1.440e+02 1.511e+02 2.187e+02, threshold=2.879e+02, percent-clipped=0.0 2024-09-25 17:10:18,721 INFO [train.py:1198] (3/4) Epoch 44, batch 2050, loss[loss=0.1888, ctc_loss=0.1191, cr_loss=0.3485, over 17309.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1209, cr_loss=0.3385, over 3359099.89 frames. ], batch size: 51, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:10:36,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=791415.3333333334, ans=0.125 2024-09-25 17:10:47,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=791415.3333333334, ans=10.0 2024-09-25 17:11:01,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=791462.0, ans=0.025 2024-09-25 17:11:03,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=791462.0, ans=0.125 2024-09-25 17:11:19,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=791508.6666666666, ans=0.0 2024-09-25 17:11:38,301 INFO [train.py:1198] (3/4) Epoch 44, batch 2100, loss[loss=0.2051, ctc_loss=0.1329, cr_loss=0.3609, over 17152.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1213, cr_loss=0.3382, over 3350652.25 frames. ], batch size: 48, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:12:19,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=791695.3333333334, ans=0.125 2024-09-25 17:12:25,394 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.332e+02 1.398e+02 1.467e+02 2.660e+02, threshold=2.796e+02, percent-clipped=0.0 2024-09-25 17:13:00,854 INFO [train.py:1198] (3/4) Epoch 44, batch 2150, loss[loss=0.1681, ctc_loss=0.1066, cr_loss=0.3076, over 17272.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1205, cr_loss=0.3372, over 3365382.72 frames. ], batch size: 42, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:13:57,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=791975.3333333334, ans=0.0 2024-09-25 17:14:28,858 INFO [train.py:1198] (3/4) Epoch 44, batch 2200, loss[loss=0.1889, ctc_loss=0.1237, cr_loss=0.3263, over 17156.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1201, cr_loss=0.3359, over 3366672.10 frames. ], batch size: 48, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:14:32,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=792068.6666666666, ans=0.0 2024-09-25 17:14:46,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.38 vs. limit=22.5 2024-09-25 17:14:56,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=792115.3333333334, ans=0.125 2024-09-25 17:14:59,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=792162.0, ans=0.0 2024-09-25 17:15:04,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=792162.0, ans=0.125 2024-09-25 17:15:09,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=792162.0, ans=10.0 2024-09-25 17:15:13,652 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.270e+02 1.367e+02 1.447e+02 1.926e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-25 17:15:47,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=792302.0, ans=0.125 2024-09-25 17:15:48,885 INFO [train.py:1198] (3/4) Epoch 44, batch 2250, loss[loss=0.1833, ctc_loss=0.118, cr_loss=0.3264, over 17016.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1199, cr_loss=0.3352, over 3363264.80 frames. ], batch size: 51, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:16:01,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=792302.0, ans=0.025 2024-09-25 17:16:23,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-09-25 17:16:34,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=792395.3333333334, ans=0.09899494936611666 2024-09-25 17:17:11,215 INFO [train.py:1198] (3/4) Epoch 44, batch 2300, loss[loss=0.1917, ctc_loss=0.1241, cr_loss=0.3383, over 17349.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1205, cr_loss=0.3368, over 3361475.64 frames. ], batch size: 48, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:17:13,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=792535.3333333334, ans=0.125 2024-09-25 17:17:25,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=792582.0, ans=0.5 2024-09-25 17:17:46,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=792628.6666666666, ans=0.125 2024-09-25 17:17:55,696 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.302e+02 1.387e+02 1.511e+02 2.811e+02, threshold=2.774e+02, percent-clipped=1.0 2024-09-25 17:18:33,654 INFO [train.py:1198] (3/4) Epoch 44, batch 2350, loss[loss=0.2099, ctc_loss=0.1363, cr_loss=0.3681, over 16867.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.121, cr_loss=0.3382, over 3351507.22 frames. ], batch size: 58, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:18:46,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=792768.6666666666, ans=0.025 2024-09-25 17:18:46,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=792768.6666666666, ans=0.0 2024-09-25 17:19:03,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=792815.3333333334, ans=0.5 2024-09-25 17:19:06,028 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-25 17:19:32,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=792908.6666666666, ans=0.05 2024-09-25 17:19:46,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=792955.3333333334, ans=10.0 2024-09-25 17:19:59,163 INFO [train.py:1198] (3/4) Epoch 44, batch 2400, loss[loss=0.1816, ctc_loss=0.1123, cr_loss=0.3464, over 17300.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1209, cr_loss=0.3385, over 3363901.35 frames. ], batch size: 46, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:20:26,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=793048.6666666666, ans=0.125 2024-09-25 17:20:28,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=793048.6666666666, ans=0.125 2024-09-25 17:20:43,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-09-25 17:20:45,390 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.312e+02 1.417e+02 1.545e+02 2.964e+02, threshold=2.835e+02, percent-clipped=1.0 2024-09-25 17:20:46,138 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=22.5 2024-09-25 17:21:10,371 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2024-09-25 17:21:17,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=793235.3333333334, ans=0.125 2024-09-25 17:21:19,094 INFO [train.py:1198] (3/4) Epoch 44, batch 2450, loss[loss=0.1709, ctc_loss=0.1065, cr_loss=0.322, over 17294.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1199, cr_loss=0.3365, over 3367581.51 frames. ], batch size: 46, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:21:42,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=793282.0, ans=0.125 2024-09-25 17:22:42,511 INFO [train.py:1198] (3/4) Epoch 44, batch 2500, loss[loss=0.1913, ctc_loss=0.1225, cr_loss=0.3437, over 17051.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1203, cr_loss=0.3374, over 3371454.32 frames. ], batch size: 56, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:22:45,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=793468.6666666666, ans=0.1 2024-09-25 17:23:31,903 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.312e+02 1.368e+02 1.453e+02 1.982e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 17:23:46,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=793608.6666666666, ans=0.0 2024-09-25 17:23:49,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=793655.3333333334, ans=0.125 2024-09-25 17:24:08,166 INFO [train.py:1198] (3/4) Epoch 44, batch 2550, loss[loss=0.1992, ctc_loss=0.1281, cr_loss=0.3557, over 16865.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1208, cr_loss=0.3382, over 3362287.93 frames. ], batch size: 58, lr: 2.69e-03, grad_scale: 16.0 2024-09-25 17:24:19,981 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.66 vs. limit=10.0 2024-09-25 17:25:05,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=793842.0, ans=0.125 2024-09-25 17:25:07,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=793842.0, ans=0.125 2024-09-25 17:25:29,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=793935.3333333334, ans=0.1 2024-09-25 17:25:31,105 INFO [train.py:1198] (3/4) Epoch 44, batch 2600, loss[loss=0.1626, ctc_loss=0.1019, cr_loss=0.3036, over 17274.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1207, cr_loss=0.3381, over 3363729.00 frames. ], batch size: 42, lr: 2.69e-03, grad_scale: 16.0 2024-09-25 17:26:00,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=793982.0, ans=0.125 2024-09-25 17:26:19,164 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.289e+02 1.404e+02 1.511e+02 2.276e+02, threshold=2.809e+02, percent-clipped=0.0 2024-09-25 17:26:29,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=794075.3333333334, ans=0.0 2024-09-25 17:26:45,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-09-25 17:26:54,092 INFO [train.py:1198] (3/4) Epoch 44, batch 2650, loss[loss=0.2235, ctc_loss=0.1454, cr_loss=0.3908, over 17000.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1209, cr_loss=0.3389, over 3359634.87 frames. ], batch size: 53, lr: 2.69e-03, grad_scale: 16.0 2024-09-25 17:26:57,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=794168.6666666666, ans=0.09899494936611666 2024-09-25 17:26:59,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2024-09-25 17:27:20,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=794215.3333333334, ans=0.025 2024-09-25 17:27:28,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=794262.0, ans=0.125 2024-09-25 17:27:31,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=794262.0, ans=0.125 2024-09-25 17:27:36,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=794262.0, ans=0.125 2024-09-25 17:28:17,365 INFO [train.py:1198] (3/4) Epoch 44, batch 2700, loss[loss=0.147, ctc_loss=0.09154, cr_loss=0.2775, over 16280.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1207, cr_loss=0.3383, over 3354989.58 frames. ], batch size: 36, lr: 2.69e-03, grad_scale: 16.0 2024-09-25 17:28:32,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=794448.6666666666, ans=0.125 2024-09-25 17:28:41,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=794448.6666666666, ans=0.125 2024-09-25 17:29:07,857 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.291e+02 1.350e+02 1.441e+02 1.690e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-25 17:29:43,112 INFO [train.py:1198] (3/4) Epoch 44, batch 2750, loss[loss=0.1692, ctc_loss=0.1063, cr_loss=0.3147, over 17219.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1214, cr_loss=0.3391, over 3358949.40 frames. ], batch size: 50, lr: 2.69e-03, grad_scale: 16.0 2024-09-25 17:29:57,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=794682.0, ans=0.09899494936611666 2024-09-25 17:30:02,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.65 vs. limit=15.0 2024-09-25 17:30:22,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.69 vs. limit=6.0 2024-09-25 17:30:27,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=794728.6666666666, ans=0.125 2024-09-25 17:30:29,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=794775.3333333334, ans=0.0 2024-09-25 17:30:49,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-09-25 17:30:52,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=794822.0, ans=15.0 2024-09-25 17:31:02,587 INFO [train.py:1198] (3/4) Epoch 44, batch 2800, loss[loss=0.1747, ctc_loss=0.1107, cr_loss=0.3201, over 17170.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.121, cr_loss=0.3383, over 3358764.39 frames. ], batch size: 45, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:31:05,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=794868.6666666666, ans=0.125 2024-09-25 17:31:15,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=794868.6666666666, ans=0.2 2024-09-25 17:31:26,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=794915.3333333334, ans=0.1 2024-09-25 17:31:52,769 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.331e+02 1.405e+02 1.538e+02 1.952e+02, threshold=2.809e+02, percent-clipped=0.0 2024-09-25 17:31:56,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=795008.6666666666, ans=0.0 2024-09-25 17:32:09,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=795055.3333333334, ans=0.125 2024-09-25 17:32:17,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=795055.3333333334, ans=0.125 2024-09-25 17:32:24,906 INFO [train.py:1198] (3/4) Epoch 44, batch 2850, loss[loss=0.1809, ctc_loss=0.1137, cr_loss=0.3361, over 17346.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1212, cr_loss=0.3386, over 3356150.64 frames. ], batch size: 48, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:32:28,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=795102.0, ans=0.125 2024-09-25 17:32:43,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.61 vs. limit=15.0 2024-09-25 17:33:00,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=795195.3333333334, ans=0.125 2024-09-25 17:33:03,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=795195.3333333334, ans=0.125 2024-09-25 17:33:19,761 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2024-09-25 17:33:24,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=795242.0, ans=0.1 2024-09-25 17:33:32,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=795288.6666666666, ans=0.1 2024-09-25 17:33:47,969 INFO [train.py:1198] (3/4) Epoch 44, batch 2900, loss[loss=0.1643, ctc_loss=0.1047, cr_loss=0.2981, over 17029.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1203, cr_loss=0.3369, over 3362976.08 frames. ], batch size: 44, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:34:06,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795382.0, ans=0.1 2024-09-25 17:34:41,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.50 vs. limit=15.0 2024-09-25 17:34:41,410 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.183e+02 1.299e+02 1.362e+02 1.425e+02 2.572e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-25 17:35:02,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=795522.0, ans=0.125 2024-09-25 17:35:13,455 INFO [train.py:1198] (3/4) Epoch 44, batch 2950, loss[loss=0.2234, ctc_loss=0.1444, cr_loss=0.3949, over 17204.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1211, cr_loss=0.3387, over 3346977.25 frames. ], batch size: 55, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:35:39,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=795615.3333333334, ans=0.0 2024-09-25 17:35:45,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=795662.0, ans=0.125 2024-09-25 17:35:47,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795662.0, ans=0.1 2024-09-25 17:36:32,578 INFO [train.py:1198] (3/4) Epoch 44, batch 3000, loss[loss=0.1696, ctc_loss=0.1097, cr_loss=0.2992, over 17016.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1211, cr_loss=0.3383, over 3351316.21 frames. ], batch size: 44, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:36:32,578 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 17:36:47,859 INFO [train.py:1230] (3/4) Epoch 44, validation: loss=0.03521, ctc_loss=0.03521, cr_loss=1.022e-14, over 944034.00 frames. 2024-09-25 17:36:47,859 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 17:37:05,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=795848.6666666666, ans=0.0 2024-09-25 17:37:19,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=795895.3333333334, ans=0.0 2024-09-25 17:37:33,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=795942.0, ans=0.125 2024-09-25 17:37:33,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=795942.0, ans=0.125 2024-09-25 17:37:34,781 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.305e+02 1.379e+02 1.486e+02 1.924e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-25 17:37:43,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.48 vs. limit=10.0 2024-09-25 17:38:06,194 INFO [train.py:1198] (3/4) Epoch 44, batch 3050, loss[loss=0.1601, ctc_loss=0.0998, cr_loss=0.3016, over 17106.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1215, cr_loss=0.3389, over 3351897.15 frames. ], batch size: 40, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:38:06,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=796035.3333333334, ans=0.125 2024-09-25 17:38:09,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=796035.3333333334, ans=0.125 2024-09-25 17:38:09,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=796035.3333333334, ans=0.125 2024-09-25 17:38:34,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2024-09-25 17:38:36,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=796128.6666666666, ans=0.0 2024-09-25 17:39:09,264 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 17:39:13,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=796222.0, ans=0.125 2024-09-25 17:39:26,748 INFO [train.py:1198] (3/4) Epoch 44, batch 3100, loss[loss=0.169, ctc_loss=0.1051, cr_loss=0.3195, over 17185.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1216, cr_loss=0.339, over 3354813.60 frames. ], batch size: 41, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:39:29,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2024-09-25 17:39:45,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=796315.3333333334, ans=0.0 2024-09-25 17:39:52,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.59 vs. limit=15.0 2024-09-25 17:40:03,451 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=22.5 2024-09-25 17:40:06,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.50 vs. limit=15.0 2024-09-25 17:40:10,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=796362.0, ans=22.5 2024-09-25 17:40:13,597 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.310e+02 1.386e+02 1.469e+02 1.981e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 17:40:22,193 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.28 vs. limit=10.0 2024-09-25 17:40:28,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=796455.3333333334, ans=0.0 2024-09-25 17:40:35,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=796455.3333333334, ans=0.07 2024-09-25 17:40:36,595 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2024-09-25 17:40:39,065 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 17:40:44,925 INFO [train.py:1198] (3/4) Epoch 44, batch 3150, loss[loss=0.2177, ctc_loss=0.1401, cr_loss=0.3883, over 16988.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1217, cr_loss=0.3393, over 3357191.41 frames. ], batch size: 53, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:41:05,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=796548.6666666666, ans=0.125 2024-09-25 17:41:15,007 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.34 vs. limit=15.0 2024-09-25 17:41:21,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.20 vs. limit=10.0 2024-09-25 17:41:37,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=796642.0, ans=0.125 2024-09-25 17:41:45,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=796642.0, ans=0.125 2024-09-25 17:42:00,715 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 17:42:04,161 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-09-25 17:42:08,193 INFO [train.py:1198] (3/4) Epoch 44, batch 3200, loss[loss=0.1602, ctc_loss=0.09876, cr_loss=0.3072, over 17094.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1213, cr_loss=0.3384, over 3364273.79 frames. ], batch size: 43, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:42:11,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=796735.3333333334, ans=0.125 2024-09-25 17:42:41,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=796828.6666666666, ans=0.0 2024-09-25 17:42:50,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.58 vs. limit=22.5 2024-09-25 17:42:57,898 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.350e+02 1.424e+02 1.561e+02 1.915e+02, threshold=2.848e+02, percent-clipped=0.0 2024-09-25 17:43:12,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=796922.0, ans=0.125 2024-09-25 17:43:14,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=796922.0, ans=0.2 2024-09-25 17:43:26,128 INFO [train.py:1198] (3/4) Epoch 44, batch 3250, loss[loss=0.1539, ctc_loss=0.09679, cr_loss=0.2855, over 16966.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1211, cr_loss=0.3373, over 3352187.17 frames. ], batch size: 42, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:43:46,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=797015.3333333334, ans=10.0 2024-09-25 17:44:11,412 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.04 vs. limit=15.0 2024-09-25 17:44:34,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=22.5 2024-09-25 17:44:38,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=797155.3333333334, ans=0.1 2024-09-25 17:44:45,005 INFO [train.py:1198] (3/4) Epoch 44, batch 3300, loss[loss=0.1985, ctc_loss=0.1267, cr_loss=0.3589, over 17014.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1212, cr_loss=0.337, over 3353441.01 frames. ], batch size: 56, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:44:46,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=797202.0, ans=0.125 2024-09-25 17:44:51,967 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=22.5 2024-09-25 17:45:01,251 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.11 vs. limit=10.0 2024-09-25 17:45:04,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=797248.6666666666, ans=0.1 2024-09-25 17:45:25,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=797295.3333333334, ans=0.0 2024-09-25 17:45:29,160 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.69 vs. limit=10.0 2024-09-25 17:45:31,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=797342.0, ans=0.0 2024-09-25 17:45:33,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=797342.0, ans=0.1 2024-09-25 17:45:34,581 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.296e+02 1.381e+02 1.496e+02 2.395e+02, threshold=2.762e+02, percent-clipped=0.0 2024-09-25 17:45:58,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=797388.6666666666, ans=0.125 2024-09-25 17:46:02,722 INFO [train.py:1198] (3/4) Epoch 44, batch 3350, loss[loss=0.1973, ctc_loss=0.1269, cr_loss=0.3524, over 17215.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1213, cr_loss=0.3373, over 3343874.49 frames. ], batch size: 50, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:46:07,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=797435.3333333334, ans=0.125 2024-09-25 17:46:18,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=22.5 2024-09-25 17:46:26,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=797482.0, ans=0.125 2024-09-25 17:46:45,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=797528.6666666666, ans=0.1 2024-09-25 17:47:02,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=797575.3333333334, ans=0.0 2024-09-25 17:47:02,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=797575.3333333334, ans=0.125 2024-09-25 17:47:21,461 INFO [train.py:1198] (3/4) Epoch 44, batch 3400, loss[loss=0.1922, ctc_loss=0.1232, cr_loss=0.3447, over 17353.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1211, cr_loss=0.3368, over 3341952.28 frames. ], batch size: 48, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:47:22,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=12.0 2024-09-25 17:47:36,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=797668.6666666666, ans=0.0 2024-09-25 17:48:11,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2024-09-25 17:48:13,774 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.159e+02 1.291e+02 1.376e+02 1.452e+02 2.263e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-25 17:48:27,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2024-09-25 17:48:42,148 INFO [train.py:1198] (3/4) Epoch 44, batch 3450, loss[loss=0.2053, ctc_loss=0.1309, cr_loss=0.3719, over 17208.00 frames. ], tot_loss[loss=0.1867, ctc_loss=0.1198, cr_loss=0.3343, over 3343417.99 frames. ], batch size: 55, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:48:50,692 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2024-09-25 17:49:10,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=797948.6666666666, ans=0.0 2024-09-25 17:49:15,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=797995.3333333334, ans=0.125 2024-09-25 17:49:17,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=797995.3333333334, ans=0.125 2024-09-25 17:49:29,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.77 vs. limit=10.0 2024-09-25 17:49:30,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=798042.0, ans=0.125 2024-09-25 17:49:35,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=798042.0, ans=0.2 2024-09-25 17:49:42,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-25 17:49:46,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=798088.6666666666, ans=0.125 2024-09-25 17:50:02,279 INFO [train.py:1198] (3/4) Epoch 44, batch 3500, loss[loss=0.1874, ctc_loss=0.1217, cr_loss=0.3287, over 15842.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.12, cr_loss=0.3345, over 3341439.25 frames. ], batch size: 74, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:50:29,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=798182.0, ans=0.07 2024-09-25 17:50:39,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=798228.6666666666, ans=0.2 2024-09-25 17:50:42,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=22.5 2024-09-25 17:50:52,795 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.299e+02 1.358e+02 1.430e+02 3.438e+02, threshold=2.715e+02, percent-clipped=1.0 2024-09-25 17:51:01,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=798275.3333333334, ans=0.035 2024-09-25 17:51:01,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=798275.3333333334, ans=0.07 2024-09-25 17:51:04,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=798275.3333333334, ans=0.125 2024-09-25 17:51:09,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798322.0, ans=0.1 2024-09-25 17:51:22,862 INFO [train.py:1198] (3/4) Epoch 44, batch 3550, loss[loss=0.2251, ctc_loss=0.1473, cr_loss=0.3891, over 14752.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1199, cr_loss=0.3351, over 3346080.37 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:51:27,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=798368.6666666666, ans=0.0 2024-09-25 17:51:43,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=798415.3333333334, ans=0.0 2024-09-25 17:51:55,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=798462.0, ans=0.125 2024-09-25 17:51:58,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=798462.0, ans=0.125 2024-09-25 17:52:00,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2024-09-25 17:52:01,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798462.0, ans=0.1 2024-09-25 17:52:04,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=798462.0, ans=0.125 2024-09-25 17:52:22,085 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 17:52:43,566 INFO [train.py:1198] (3/4) Epoch 44, batch 3600, loss[loss=0.2195, ctc_loss=0.1425, cr_loss=0.3849, over 17046.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1212, cr_loss=0.3379, over 3354651.56 frames. ], batch size: 56, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:52:44,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-09-25 17:52:56,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798602.0, ans=0.1 2024-09-25 17:53:16,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=798695.3333333334, ans=0.125 2024-09-25 17:53:34,700 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.315e+02 1.375e+02 1.458e+02 2.973e+02, threshold=2.750e+02, percent-clipped=1.0 2024-09-25 17:53:50,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=798788.6666666666, ans=0.125 2024-09-25 17:53:51,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-09-25 17:53:53,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=798788.6666666666, ans=0.125 2024-09-25 17:54:01,421 INFO [train.py:1198] (3/4) Epoch 44, batch 3650, loss[loss=0.1862, ctc_loss=0.121, cr_loss=0.326, over 17126.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1218, cr_loss=0.3387, over 3359796.21 frames. ], batch size: 40, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:54:11,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2024-09-25 17:54:22,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=798882.0, ans=0.125 2024-09-25 17:54:59,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=798975.3333333334, ans=0.0 2024-09-25 17:55:07,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=799022.0, ans=0.0 2024-09-25 17:55:17,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=799022.0, ans=0.1 2024-09-25 17:55:20,638 INFO [train.py:1198] (3/4) Epoch 44, batch 3700, loss[loss=0.214, ctc_loss=0.1408, cr_loss=0.3661, over 16984.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1211, cr_loss=0.3379, over 3367278.34 frames. ], batch size: 53, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:55:36,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=799115.3333333334, ans=0.0 2024-09-25 17:55:42,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=799115.3333333334, ans=0.1 2024-09-25 17:55:44,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=12.0 2024-09-25 17:56:11,869 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.294e+02 1.350e+02 1.442e+02 2.627e+02, threshold=2.701e+02, percent-clipped=0.0 2024-09-25 17:56:38,700 INFO [train.py:1198] (3/4) Epoch 44, batch 3750, loss[loss=0.2176, ctc_loss=0.1403, cr_loss=0.3865, over 16861.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1213, cr_loss=0.3378, over 3347178.71 frames. ], batch size: 58, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:56:50,433 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=15.0 2024-09-25 17:56:51,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=799302.0, ans=0.025 2024-09-25 17:57:40,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=799488.6666666666, ans=0.05 2024-09-25 17:57:57,079 INFO [train.py:1198] (3/4) Epoch 44, batch 3800, loss[loss=0.1773, ctc_loss=0.1113, cr_loss=0.33, over 17163.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1219, cr_loss=0.3389, over 3332039.18 frames. ], batch size: 45, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:58:03,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=799535.3333333334, ans=0.125 2024-09-25 17:58:17,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=799582.0, ans=0.1 2024-09-25 17:58:47,999 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.185e+02 1.317e+02 1.409e+02 1.513e+02 1.849e+02, threshold=2.818e+02, percent-clipped=0.0 2024-09-25 17:58:48,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=799675.3333333334, ans=0.125 2024-09-25 17:58:51,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2024-09-25 17:58:52,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=799675.3333333334, ans=0.1 2024-09-25 17:59:14,339 INFO [train.py:1198] (3/4) Epoch 44, batch 3850, loss[loss=0.2493, ctc_loss=0.1674, cr_loss=0.4096, over 11667.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1236, cr_loss=0.3406, over 3272560.59 frames. ], batch size: 123, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:59:16,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2024-09-25 17:59:28,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=799815.3333333334, ans=0.125 2024-09-25 17:59:32,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=799815.3333333334, ans=0.125 2024-09-25 18:01:14,624 INFO [train.py:1198] (3/4) Epoch 45, batch 0, loss[loss=0.1872, ctc_loss=0.1172, cr_loss=0.3501, over 17258.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1172, cr_loss=0.3501, over 17258.00 frames. ], batch size: 44, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:01:14,625 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 18:01:29,791 INFO [train.py:1230] (3/4) Epoch 45, validation: loss=0.03539, ctc_loss=0.03539, cr_loss=1.113e-14, over 944034.00 frames. 2024-09-25 18:01:29,792 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 18:01:41,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=799983.3333333334, ans=0.04949747468305833 2024-09-25 18:01:42,692 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 18:01:52,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.04 vs. limit=22.5 2024-09-25 18:02:09,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.80 vs. limit=10.0 2024-09-25 18:02:13,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=800076.6666666666, ans=0.2 2024-09-25 18:02:16,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=800076.6666666666, ans=0.0 2024-09-25 18:02:18,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=800076.6666666666, ans=0.1 2024-09-25 18:02:24,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=800123.3333333334, ans=0.125 2024-09-25 18:02:26,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=800123.3333333334, ans=0.125 2024-09-25 18:02:31,991 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.406e+02 1.547e+02 1.686e+02 2.322e+02, threshold=3.093e+02, percent-clipped=0.0 2024-09-25 18:02:44,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=800170.0, ans=0.0 2024-09-25 18:02:48,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=800170.0, ans=0.125 2024-09-25 18:02:52,938 INFO [train.py:1198] (3/4) Epoch 45, batch 50, loss[loss=0.185, ctc_loss=0.1148, cr_loss=0.3514, over 16786.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1232, cr_loss=0.3456, over 755235.17 frames. ], batch size: 61, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:03:01,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2024-09-25 18:03:05,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=15.0 2024-09-25 18:03:09,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=800263.3333333334, ans=0.1 2024-09-25 18:03:15,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=800263.3333333334, ans=0.0 2024-09-25 18:03:25,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=800310.0, ans=0.125 2024-09-25 18:03:41,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=800356.6666666666, ans=0.07 2024-09-25 18:03:44,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=800356.6666666666, ans=0.125 2024-09-25 18:04:13,073 INFO [train.py:1198] (3/4) Epoch 45, batch 100, loss[loss=0.1913, ctc_loss=0.1203, cr_loss=0.3551, over 17163.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1208, cr_loss=0.3394, over 1340023.48 frames. ], batch size: 45, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:04:26,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=800450.0, ans=0.025 2024-09-25 18:04:41,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=800496.6666666666, ans=0.2 2024-09-25 18:05:12,264 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.307e+02 1.359e+02 1.469e+02 2.520e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-25 18:05:30,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=12.0 2024-09-25 18:05:35,982 INFO [train.py:1198] (3/4) Epoch 45, batch 150, loss[loss=0.1667, ctc_loss=0.1058, cr_loss=0.3042, over 17204.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1224, cr_loss=0.3406, over 1771108.01 frames. ], batch size: 47, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:05:48,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=800683.3333333334, ans=0.125 2024-09-25 18:05:50,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=800683.3333333334, ans=0.125 2024-09-25 18:06:21,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800776.6666666666, ans=0.1 2024-09-25 18:06:38,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=800823.3333333334, ans=0.025 2024-09-25 18:06:45,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=800870.0, ans=0.125 2024-09-25 18:07:02,960 INFO [train.py:1198] (3/4) Epoch 45, batch 200, loss[loss=0.1678, ctc_loss=0.1054, cr_loss=0.3119, over 17087.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1224, cr_loss=0.341, over 2104871.76 frames. ], batch size: 40, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:07:27,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=800963.3333333334, ans=0.025 2024-09-25 18:07:42,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.37 vs. limit=6.0 2024-09-25 18:07:46,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=801010.0, ans=0.0 2024-09-25 18:07:54,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=801056.6666666666, ans=0.2 2024-09-25 18:08:02,078 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.278e+02 1.376e+02 1.483e+02 1.957e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-25 18:08:12,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=801103.3333333334, ans=0.125 2024-09-25 18:08:23,396 INFO [train.py:1198] (3/4) Epoch 45, batch 250, loss[loss=0.1828, ctc_loss=0.1167, cr_loss=0.3303, over 17222.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1214, cr_loss=0.3389, over 2381838.49 frames. ], batch size: 50, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:08:31,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=801150.0, ans=0.025 2024-09-25 18:08:54,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=801243.3333333334, ans=0.125 2024-09-25 18:09:00,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=801243.3333333334, ans=0.125 2024-09-25 18:09:43,108 INFO [train.py:1198] (3/4) Epoch 45, batch 300, loss[loss=0.1963, ctc_loss=0.1305, cr_loss=0.3291, over 12027.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1211, cr_loss=0.3378, over 2597929.13 frames. ], batch size: 123, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:09:48,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.33 vs. limit=10.0 2024-09-25 18:10:04,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=801430.0, ans=0.1 2024-09-25 18:10:16,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=801476.6666666666, ans=0.1 2024-09-25 18:10:23,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=801476.6666666666, ans=0.0 2024-09-25 18:10:28,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=801476.6666666666, ans=0.125 2024-09-25 18:10:48,158 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.282e+02 1.376e+02 1.463e+02 2.041e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-25 18:11:09,115 INFO [train.py:1198] (3/4) Epoch 45, batch 350, loss[loss=0.2147, ctc_loss=0.1386, cr_loss=0.3806, over 16526.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1222, cr_loss=0.3399, over 2755257.52 frames. ], batch size: 66, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:11:34,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=801663.3333333334, ans=0.125 2024-09-25 18:12:13,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=801756.6666666666, ans=0.2 2024-09-25 18:12:27,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.59 vs. limit=12.0 2024-09-25 18:12:34,309 INFO [train.py:1198] (3/4) Epoch 45, batch 400, loss[loss=0.2057, ctc_loss=0.1327, cr_loss=0.365, over 17289.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1214, cr_loss=0.3381, over 2883834.87 frames. ], batch size: 51, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:13:06,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.28 vs. limit=10.0 2024-09-25 18:13:28,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=801990.0, ans=0.0 2024-09-25 18:13:34,527 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.278e+02 1.369e+02 1.473e+02 1.980e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-25 18:13:46,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=802036.6666666666, ans=0.0 2024-09-25 18:13:53,912 INFO [train.py:1198] (3/4) Epoch 45, batch 450, loss[loss=0.2167, ctc_loss=0.1387, cr_loss=0.3904, over 16599.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1215, cr_loss=0.3382, over 2988838.74 frames. ], batch size: 66, lr: 2.65e-03, grad_scale: 16.0 2024-09-25 18:14:13,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=802130.0, ans=0.125 2024-09-25 18:14:22,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=802130.0, ans=0.0 2024-09-25 18:14:27,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=802176.6666666666, ans=10.0 2024-09-25 18:14:56,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=802270.0, ans=0.125 2024-09-25 18:15:16,372 INFO [train.py:1198] (3/4) Epoch 45, batch 500, loss[loss=0.1803, ctc_loss=0.1146, cr_loss=0.3287, over 17161.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.121, cr_loss=0.3381, over 3076271.09 frames. ], batch size: 45, lr: 2.65e-03, grad_scale: 16.0 2024-09-25 18:15:19,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=802316.6666666666, ans=0.0 2024-09-25 18:15:23,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=802316.6666666666, ans=0.0 2024-09-25 18:16:04,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=802410.0, ans=0.1 2024-09-25 18:16:15,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=802456.6666666666, ans=0.0 2024-09-25 18:16:22,897 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.274e+02 1.342e+02 1.426e+02 1.767e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-25 18:16:30,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2024-09-25 18:16:42,133 INFO [train.py:1198] (3/4) Epoch 45, batch 550, loss[loss=0.1832, ctc_loss=0.119, cr_loss=0.3212, over 17232.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1199, cr_loss=0.3356, over 3145700.41 frames. ], batch size: 55, lr: 2.65e-03, grad_scale: 16.0 2024-09-25 18:16:50,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2024-09-25 18:17:51,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-09-25 18:18:03,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=802736.6666666666, ans=0.125 2024-09-25 18:18:04,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=802736.6666666666, ans=0.125 2024-09-25 18:18:05,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=802783.3333333334, ans=0.0 2024-09-25 18:18:06,934 INFO [train.py:1198] (3/4) Epoch 45, batch 600, loss[loss=0.1902, ctc_loss=0.1221, cr_loss=0.3403, over 17017.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1197, cr_loss=0.3357, over 3193557.08 frames. ], batch size: 44, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:18:10,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=802783.3333333334, ans=0.025 2024-09-25 18:18:13,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=802783.3333333334, ans=0.1 2024-09-25 18:18:31,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=802830.0, ans=0.0 2024-09-25 18:19:07,371 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.202e+02 1.313e+02 1.405e+02 1.513e+02 2.621e+02, threshold=2.810e+02, percent-clipped=0.0 2024-09-25 18:19:26,372 INFO [train.py:1198] (3/4) Epoch 45, batch 650, loss[loss=0.1972, ctc_loss=0.1249, cr_loss=0.3613, over 16774.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1204, cr_loss=0.3369, over 3226165.57 frames. ], batch size: 61, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:19:37,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803016.6666666666, ans=0.1 2024-09-25 18:19:52,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=803063.3333333334, ans=0.125 2024-09-25 18:20:07,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-09-25 18:20:39,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=803203.3333333334, ans=0.0 2024-09-25 18:20:52,180 INFO [train.py:1198] (3/4) Epoch 45, batch 700, loss[loss=0.1593, ctc_loss=0.09881, cr_loss=0.3025, over 17157.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.12, cr_loss=0.3358, over 3261570.23 frames. ], batch size: 41, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:20:52,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=803250.0, ans=0.025 2024-09-25 18:21:05,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.80 vs. limit=12.0 2024-09-25 18:21:08,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=803296.6666666666, ans=0.04949747468305833 2024-09-25 18:21:29,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=803343.3333333334, ans=0.0 2024-09-25 18:21:30,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=803343.3333333334, ans=0.0 2024-09-25 18:21:30,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=803343.3333333334, ans=0.0 2024-09-25 18:21:35,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803343.3333333334, ans=0.1 2024-09-25 18:21:44,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=803390.0, ans=0.125 2024-09-25 18:21:51,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=803390.0, ans=0.125 2024-09-25 18:21:51,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=803390.0, ans=0.0 2024-09-25 18:21:55,975 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.305e+02 1.379e+02 1.482e+02 2.055e+02, threshold=2.759e+02, percent-clipped=0.0 2024-09-25 18:22:00,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=803436.6666666666, ans=0.2 2024-09-25 18:22:04,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=803436.6666666666, ans=0.2 2024-09-25 18:22:18,093 INFO [train.py:1198] (3/4) Epoch 45, batch 750, loss[loss=0.1876, ctc_loss=0.1179, cr_loss=0.3483, over 17073.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1204, cr_loss=0.3364, over 3272673.33 frames. ], batch size: 46, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:22:41,075 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=12.0 2024-09-25 18:23:22,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=803670.0, ans=0.2 2024-09-25 18:23:38,654 INFO [train.py:1198] (3/4) Epoch 45, batch 800, loss[loss=0.185, ctc_loss=0.1172, cr_loss=0.3389, over 17344.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.12, cr_loss=0.3359, over 3290785.51 frames. ], batch size: 48, lr: 2.64e-03, grad_scale: 32.0 2024-09-25 18:23:48,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=803716.6666666666, ans=0.125 2024-09-25 18:24:15,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=803810.0, ans=0.5 2024-09-25 18:24:17,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=803810.0, ans=0.025 2024-09-25 18:24:31,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=803856.6666666666, ans=0.0 2024-09-25 18:24:39,375 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.279e+02 1.371e+02 1.474e+02 1.917e+02, threshold=2.742e+02, percent-clipped=0.0 2024-09-25 18:24:39,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=803856.6666666666, ans=0.0 2024-09-25 18:24:58,300 INFO [train.py:1198] (3/4) Epoch 45, batch 850, loss[loss=0.1646, ctc_loss=0.1038, cr_loss=0.3045, over 17295.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1195, cr_loss=0.3351, over 3307545.22 frames. ], batch size: 42, lr: 2.64e-03, grad_scale: 32.0 2024-09-25 18:25:11,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2024-09-25 18:25:26,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=803996.6666666666, ans=0.1 2024-09-25 18:25:31,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=804043.3333333334, ans=0.2 2024-09-25 18:26:18,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=804136.6666666666, ans=0.1 2024-09-25 18:26:26,523 INFO [train.py:1198] (3/4) Epoch 45, batch 900, loss[loss=0.1649, ctc_loss=0.1059, cr_loss=0.2948, over 16319.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1199, cr_loss=0.336, over 3309657.50 frames. ], batch size: 36, lr: 2.64e-03, grad_scale: 32.0 2024-09-25 18:26:44,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=804230.0, ans=0.07 2024-09-25 18:27:12,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=804276.6666666666, ans=0.015 2024-09-25 18:27:14,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=804276.6666666666, ans=0.0 2024-09-25 18:27:23,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=804323.3333333334, ans=0.125 2024-09-25 18:27:23,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=804323.3333333334, ans=0.125 2024-09-25 18:27:29,592 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.299e+02 1.367e+02 1.436e+02 2.740e+02, threshold=2.735e+02, percent-clipped=0.0 2024-09-25 18:27:29,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=804323.3333333334, ans=0.125 2024-09-25 18:27:34,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=804370.0, ans=0.0 2024-09-25 18:27:44,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=804370.0, ans=0.0 2024-09-25 18:27:48,684 INFO [train.py:1198] (3/4) Epoch 45, batch 950, loss[loss=0.213, ctc_loss=0.1346, cr_loss=0.3919, over 16935.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.12, cr_loss=0.336, over 3317352.70 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:28:09,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=804463.3333333334, ans=0.125 2024-09-25 18:28:14,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=804463.3333333334, ans=0.09899494936611666 2024-09-25 18:28:28,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=804510.0, ans=0.125 2024-09-25 18:28:52,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=804603.3333333334, ans=0.0 2024-09-25 18:28:55,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=804603.3333333334, ans=0.1 2024-09-25 18:29:07,946 INFO [train.py:1198] (3/4) Epoch 45, batch 1000, loss[loss=0.1725, ctc_loss=0.107, cr_loss=0.3277, over 17064.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.12, cr_loss=0.3368, over 3332875.04 frames. ], batch size: 39, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:29:13,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=804650.0, ans=0.125 2024-09-25 18:29:14,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=804650.0, ans=0.2 2024-09-25 18:29:19,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=804650.0, ans=0.125 2024-09-25 18:29:46,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=804743.3333333334, ans=0.1 2024-09-25 18:30:13,227 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.285e+02 1.391e+02 1.486e+02 1.776e+02, threshold=2.782e+02, percent-clipped=0.0 2024-09-25 18:30:33,441 INFO [train.py:1198] (3/4) Epoch 45, batch 1050, loss[loss=0.1961, ctc_loss=0.1279, cr_loss=0.3408, over 17139.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1202, cr_loss=0.3374, over 3340428.34 frames. ], batch size: 48, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:31:10,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=804976.6666666666, ans=0.09899494936611666 2024-09-25 18:31:17,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=804976.6666666666, ans=0.2 2024-09-25 18:31:48,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=805070.0, ans=0.1 2024-09-25 18:31:52,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=805070.0, ans=0.0 2024-09-25 18:31:58,557 INFO [train.py:1198] (3/4) Epoch 45, batch 1100, loss[loss=0.219, ctc_loss=0.1476, cr_loss=0.357, over 11815.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1197, cr_loss=0.3358, over 3333767.86 frames. ], batch size: 125, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:32:02,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2024-09-25 18:32:05,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=22.5 2024-09-25 18:32:38,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=805210.0, ans=0.2 2024-09-25 18:32:57,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=805256.6666666666, ans=0.025 2024-09-25 18:33:00,835 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.299e+02 1.379e+02 1.493e+02 1.866e+02, threshold=2.759e+02, percent-clipped=0.0 2024-09-25 18:33:07,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=805303.3333333334, ans=0.05 2024-09-25 18:33:18,439 INFO [train.py:1198] (3/4) Epoch 45, batch 1150, loss[loss=0.2236, ctc_loss=0.1496, cr_loss=0.3704, over 17203.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1205, cr_loss=0.337, over 3325645.61 frames. ], batch size: 55, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:33:41,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=805396.6666666666, ans=10.0 2024-09-25 18:33:49,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=805443.3333333334, ans=22.5 2024-09-25 18:33:59,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=805443.3333333334, ans=0.5 2024-09-25 18:34:09,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2024-09-25 18:34:24,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=805536.6666666666, ans=0.125 2024-09-25 18:34:33,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2024-09-25 18:34:38,827 INFO [train.py:1198] (3/4) Epoch 45, batch 1200, loss[loss=0.197, ctc_loss=0.1258, cr_loss=0.3559, over 16870.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1204, cr_loss=0.3365, over 3335511.49 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:34:50,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=805583.3333333334, ans=0.1 2024-09-25 18:35:15,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=805676.6666666666, ans=0.0 2024-09-25 18:35:27,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=805676.6666666666, ans=0.0 2024-09-25 18:35:39,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=805723.3333333334, ans=0.0 2024-09-25 18:35:43,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=805723.3333333334, ans=0.0 2024-09-25 18:35:44,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=805723.3333333334, ans=0.125 2024-09-25 18:35:47,622 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.307e+02 1.404e+02 1.482e+02 2.313e+02, threshold=2.807e+02, percent-clipped=0.0 2024-09-25 18:35:55,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=12.0 2024-09-25 18:35:57,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=805770.0, ans=0.05 2024-09-25 18:35:59,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=805770.0, ans=0.125 2024-09-25 18:36:06,489 INFO [train.py:1198] (3/4) Epoch 45, batch 1250, loss[loss=0.1902, ctc_loss=0.1233, cr_loss=0.3344, over 17366.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1202, cr_loss=0.3362, over 3336763.15 frames. ], batch size: 48, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:36:06,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=805816.6666666666, ans=0.5 2024-09-25 18:36:09,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=15.0 2024-09-25 18:36:09,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=805816.6666666666, ans=0.025 2024-09-25 18:36:53,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2024-09-25 18:37:15,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=806003.3333333334, ans=0.125 2024-09-25 18:37:23,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=806003.3333333334, ans=0.125 2024-09-25 18:37:29,082 INFO [train.py:1198] (3/4) Epoch 45, batch 1300, loss[loss=0.1734, ctc_loss=0.1092, cr_loss=0.3214, over 17227.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.121, cr_loss=0.3381, over 3338626.05 frames. ], batch size: 41, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:37:29,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=806050.0, ans=0.125 2024-09-25 18:37:34,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=806050.0, ans=0.1 2024-09-25 18:37:40,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=806050.0, ans=0.125 2024-09-25 18:37:58,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.21 vs. limit=15.0 2024-09-25 18:38:31,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=806236.6666666666, ans=0.125 2024-09-25 18:38:32,668 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.317e+02 1.369e+02 1.447e+02 1.900e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-25 18:38:44,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=806236.6666666666, ans=0.0 2024-09-25 18:38:47,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=806283.3333333334, ans=0.1 2024-09-25 18:38:48,685 INFO [train.py:1198] (3/4) Epoch 45, batch 1350, loss[loss=0.1855, ctc_loss=0.1181, cr_loss=0.3369, over 16497.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1209, cr_loss=0.3379, over 3339163.28 frames. ], batch size: 66, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:38:53,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=806283.3333333334, ans=0.125 2024-09-25 18:39:03,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=806330.0, ans=0.125 2024-09-25 18:39:48,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=806423.3333333334, ans=0.125 2024-09-25 18:40:12,419 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.13 vs. limit=15.0 2024-09-25 18:40:14,365 INFO [train.py:1198] (3/4) Epoch 45, batch 1400, loss[loss=0.1871, ctc_loss=0.1185, cr_loss=0.3431, over 17158.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1214, cr_loss=0.3386, over 3336614.92 frames. ], batch size: 45, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:40:26,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2024-09-25 18:41:02,321 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 18:41:11,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=806656.6666666666, ans=0.0 2024-09-25 18:41:11,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=806656.6666666666, ans=0.125 2024-09-25 18:41:21,112 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.028e+02 1.293e+02 1.370e+02 1.466e+02 2.131e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-25 18:41:37,443 INFO [train.py:1198] (3/4) Epoch 45, batch 1450, loss[loss=0.1815, ctc_loss=0.1163, cr_loss=0.3259, over 17314.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1212, cr_loss=0.3378, over 3333304.23 frames. ], batch size: 51, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:41:57,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=806796.6666666666, ans=0.0 2024-09-25 18:42:07,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=806796.6666666666, ans=0.2 2024-09-25 18:42:12,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=806843.3333333334, ans=0.1 2024-09-25 18:42:39,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=806890.0, ans=0.0 2024-09-25 18:42:43,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=806936.6666666666, ans=0.125 2024-09-25 18:42:45,060 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-09-25 18:42:51,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=806936.6666666666, ans=0.0 2024-09-25 18:43:00,442 INFO [train.py:1198] (3/4) Epoch 45, batch 1500, loss[loss=0.1918, ctc_loss=0.1211, cr_loss=0.3533, over 16631.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1208, cr_loss=0.3372, over 3336501.42 frames. ], batch size: 61, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:43:00,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=806983.3333333334, ans=0.125 2024-09-25 18:43:05,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=806983.3333333334, ans=0.125 2024-09-25 18:43:16,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=807030.0, ans=0.125 2024-09-25 18:43:26,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=807030.0, ans=0.025 2024-09-25 18:44:03,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=12.0 2024-09-25 18:44:04,286 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.307e+02 1.374e+02 1.496e+02 1.969e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-25 18:44:14,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=807170.0, ans=0.0 2024-09-25 18:44:20,509 INFO [train.py:1198] (3/4) Epoch 45, batch 1550, loss[loss=0.1976, ctc_loss=0.1268, cr_loss=0.354, over 16943.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1207, cr_loss=0.3369, over 3336665.40 frames. ], batch size: 42, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:45:09,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=807310.0, ans=0.0 2024-09-25 18:45:25,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2024-09-25 18:45:31,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=807403.3333333334, ans=0.125 2024-09-25 18:45:45,595 INFO [train.py:1198] (3/4) Epoch 45, batch 1600, loss[loss=0.1993, ctc_loss=0.1306, cr_loss=0.3436, over 17279.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1202, cr_loss=0.3361, over 3336623.44 frames. ], batch size: 51, lr: 2.64e-03, grad_scale: 32.0 2024-09-25 18:45:52,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=807450.0, ans=0.2 2024-09-25 18:45:52,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2024-09-25 18:45:55,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=807450.0, ans=0.125 2024-09-25 18:46:02,861 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 18:46:04,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=807496.6666666666, ans=0.1 2024-09-25 18:46:36,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=807590.0, ans=0.1 2024-09-25 18:46:44,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=807590.0, ans=0.2 2024-09-25 18:46:56,354 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.296e+02 1.378e+02 1.483e+02 2.434e+02, threshold=2.757e+02, percent-clipped=0.0 2024-09-25 18:47:10,855 INFO [train.py:1198] (3/4) Epoch 45, batch 1650, loss[loss=0.221, ctc_loss=0.1459, cr_loss=0.3754, over 14864.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1205, cr_loss=0.3366, over 3347512.18 frames. ], batch size: 89, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:47:20,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=807683.3333333334, ans=0.125 2024-09-25 18:47:28,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=807730.0, ans=0.07 2024-09-25 18:47:56,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.11 vs. limit=15.0 2024-09-25 18:48:20,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=22.5 2024-09-25 18:48:31,085 INFO [train.py:1198] (3/4) Epoch 45, batch 1700, loss[loss=0.2217, ctc_loss=0.1435, cr_loss=0.391, over 17014.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1203, cr_loss=0.3366, over 3357712.01 frames. ], batch size: 53, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:48:53,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=807963.3333333334, ans=0.125 2024-09-25 18:49:00,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=807963.3333333334, ans=10.0 2024-09-25 18:49:06,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=808010.0, ans=0.2 2024-09-25 18:49:12,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=808010.0, ans=0.125 2024-09-25 18:49:13,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.84 vs. limit=15.0 2024-09-25 18:49:15,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=808010.0, ans=0.125 2024-09-25 18:49:36,103 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.276e+02 1.351e+02 1.443e+02 2.943e+02, threshold=2.702e+02, percent-clipped=1.0 2024-09-25 18:49:49,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=808150.0, ans=0.025 2024-09-25 18:49:50,478 INFO [train.py:1198] (3/4) Epoch 45, batch 1750, loss[loss=0.24, ctc_loss=0.1665, cr_loss=0.3674, over 11713.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1212, cr_loss=0.3379, over 3348848.76 frames. ], batch size: 123, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:50:19,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=808196.6666666666, ans=0.0 2024-09-25 18:50:50,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=808290.0, ans=0.0 2024-09-25 18:51:09,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=808336.6666666666, ans=0.1 2024-09-25 18:51:18,076 INFO [train.py:1198] (3/4) Epoch 45, batch 1800, loss[loss=0.1921, ctc_loss=0.1232, cr_loss=0.3446, over 17150.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1214, cr_loss=0.3386, over 3345645.52 frames. ], batch size: 48, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:51:31,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=808383.3333333334, ans=0.0 2024-09-25 18:51:38,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=808430.0, ans=0.2 2024-09-25 18:52:02,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808476.6666666666, ans=0.1 2024-09-25 18:52:13,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=808523.3333333334, ans=0.125 2024-09-25 18:52:20,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=808523.3333333334, ans=0.1 2024-09-25 18:52:26,149 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.286e+02 1.376e+02 1.494e+02 2.508e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-25 18:52:39,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=808616.6666666666, ans=0.125 2024-09-25 18:52:40,621 INFO [train.py:1198] (3/4) Epoch 45, batch 1850, loss[loss=0.2186, ctc_loss=0.1457, cr_loss=0.3646, over 11706.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1216, cr_loss=0.3392, over 3334219.86 frames. ], batch size: 123, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:52:45,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=808616.6666666666, ans=0.0 2024-09-25 18:52:49,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.90 vs. limit=12.0 2024-09-25 18:52:50,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=808616.6666666666, ans=0.0 2024-09-25 18:52:57,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=808663.3333333334, ans=0.125 2024-09-25 18:53:04,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=808663.3333333334, ans=0.0 2024-09-25 18:53:26,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=22.5 2024-09-25 18:53:38,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808756.6666666666, ans=0.1 2024-09-25 18:53:49,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=808803.3333333334, ans=0.025 2024-09-25 18:53:54,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=808803.3333333334, ans=0.0 2024-09-25 18:54:00,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=22.5 2024-09-25 18:54:00,632 INFO [train.py:1198] (3/4) Epoch 45, batch 1900, loss[loss=0.1932, ctc_loss=0.1245, cr_loss=0.3432, over 17155.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1219, cr_loss=0.3393, over 3332109.77 frames. ], batch size: 48, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 18:54:07,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=808850.0, ans=0.1 2024-09-25 18:54:07,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=808850.0, ans=0.125 2024-09-25 18:54:13,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=808850.0, ans=0.1 2024-09-25 18:54:23,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=808896.6666666666, ans=0.125 2024-09-25 18:54:52,444 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.78 vs. limit=10.0 2024-09-25 18:55:08,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2024-09-25 18:55:12,118 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.309e+02 1.390e+02 1.516e+02 1.973e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-25 18:55:12,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=809036.6666666666, ans=0.125 2024-09-25 18:55:26,422 INFO [train.py:1198] (3/4) Epoch 45, batch 1950, loss[loss=0.1588, ctc_loss=0.102, cr_loss=0.2838, over 17058.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1219, cr_loss=0.3398, over 3330530.43 frames. ], batch size: 39, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 18:55:44,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2024-09-25 18:55:49,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=809130.0, ans=0.0 2024-09-25 18:55:50,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=809130.0, ans=0.125 2024-09-25 18:56:00,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=809176.6666666666, ans=10.0 2024-09-25 18:56:24,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=809223.3333333334, ans=0.07 2024-09-25 18:56:49,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=809270.0, ans=0.125 2024-09-25 18:56:52,062 INFO [train.py:1198] (3/4) Epoch 45, batch 2000, loss[loss=0.178, ctc_loss=0.1108, cr_loss=0.336, over 17076.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1216, cr_loss=0.3396, over 3339203.29 frames. ], batch size: 43, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 18:57:03,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=809316.6666666666, ans=0.0 2024-09-25 18:57:27,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=809410.0, ans=0.125 2024-09-25 18:57:33,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=809410.0, ans=0.125 2024-09-25 18:57:38,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=809456.6666666666, ans=0.125 2024-09-25 18:57:57,599 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.313e+02 1.395e+02 1.516e+02 2.306e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-25 18:58:12,072 INFO [train.py:1198] (3/4) Epoch 45, batch 2050, loss[loss=0.213, ctc_loss=0.1367, cr_loss=0.3815, over 16896.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1213, cr_loss=0.3399, over 3352659.84 frames. ], batch size: 58, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 18:58:19,351 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.36 vs. limit=22.5 2024-09-25 18:58:42,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=809643.3333333334, ans=0.5 2024-09-25 18:58:46,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=809643.3333333334, ans=0.1 2024-09-25 18:59:08,450 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 18:59:22,787 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 18:59:22,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=809736.6666666666, ans=0.0 2024-09-25 18:59:32,220 INFO [train.py:1198] (3/4) Epoch 45, batch 2100, loss[loss=0.1838, ctc_loss=0.1157, cr_loss=0.3406, over 17235.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1214, cr_loss=0.3396, over 3349839.81 frames. ], batch size: 50, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 18:59:40,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=809783.3333333334, ans=0.09899494936611666 2024-09-25 18:59:55,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=809830.0, ans=0.125 2024-09-25 18:59:58,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.90 vs. limit=22.5 2024-09-25 19:00:32,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=809923.3333333334, ans=0.0 2024-09-25 19:00:32,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=809923.3333333334, ans=0.125 2024-09-25 19:00:35,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=809923.3333333334, ans=0.125 2024-09-25 19:00:40,214 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.255e+02 1.338e+02 1.412e+02 1.754e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-25 19:00:42,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=809970.0, ans=0.125 2024-09-25 19:00:46,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=809970.0, ans=0.125 2024-09-25 19:00:56,675 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=15.0 2024-09-25 19:00:57,181 INFO [train.py:1198] (3/4) Epoch 45, batch 2150, loss[loss=0.1652, ctc_loss=0.1062, cr_loss=0.2954, over 16954.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1208, cr_loss=0.3384, over 3352909.08 frames. ], batch size: 42, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:01:05,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=810016.6666666666, ans=0.2 2024-09-25 19:01:08,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=810016.6666666666, ans=0.025 2024-09-25 19:01:14,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=22.5 2024-09-25 19:01:16,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=810063.3333333334, ans=0.1 2024-09-25 19:01:18,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=810063.3333333334, ans=0.2 2024-09-25 19:01:18,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=810063.3333333334, ans=0.025 2024-09-25 19:01:23,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2024-09-25 19:01:40,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=810110.0, ans=0.1 2024-09-25 19:02:00,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=810156.6666666666, ans=0.125 2024-09-25 19:02:05,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=810203.3333333334, ans=0.1 2024-09-25 19:02:20,042 INFO [train.py:1198] (3/4) Epoch 45, batch 2200, loss[loss=0.1493, ctc_loss=0.09445, cr_loss=0.2743, over 17186.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1213, cr_loss=0.3392, over 3350272.82 frames. ], batch size: 41, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:02:37,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=810296.6666666666, ans=0.1 2024-09-25 19:03:04,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=22.5 2024-09-25 19:03:27,150 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.297e+02 1.368e+02 1.487e+02 2.152e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 19:03:40,106 INFO [train.py:1198] (3/4) Epoch 45, batch 2250, loss[loss=0.2132, ctc_loss=0.1353, cr_loss=0.3895, over 17360.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1203, cr_loss=0.3374, over 3356441.51 frames. ], batch size: 48, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:03:56,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=810530.0, ans=0.025 2024-09-25 19:04:04,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-09-25 19:04:05,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=810530.0, ans=0.125 2024-09-25 19:04:12,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=810576.6666666666, ans=0.125 2024-09-25 19:04:21,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.39 vs. limit=10.0 2024-09-25 19:04:36,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2024-09-25 19:05:02,935 INFO [train.py:1198] (3/4) Epoch 45, batch 2300, loss[loss=0.2036, ctc_loss=0.1313, cr_loss=0.3614, over 17257.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1202, cr_loss=0.3372, over 3363599.88 frames. ], batch size: 55, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:05:07,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=810716.6666666666, ans=0.0 2024-09-25 19:05:15,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=810716.6666666666, ans=0.125 2024-09-25 19:05:17,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=810763.3333333334, ans=0.025 2024-09-25 19:05:18,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=810763.3333333334, ans=0.125 2024-09-25 19:05:33,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=810810.0, ans=0.0 2024-09-25 19:06:08,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=12.0 2024-09-25 19:06:12,004 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.306e+02 1.375e+02 1.455e+02 1.967e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-25 19:06:27,349 INFO [train.py:1198] (3/4) Epoch 45, batch 2350, loss[loss=0.209, ctc_loss=0.1343, cr_loss=0.3735, over 16834.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.12, cr_loss=0.3374, over 3367639.04 frames. ], batch size: 58, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:06:27,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=810950.0, ans=0.0 2024-09-25 19:06:37,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=810950.0, ans=0.5 2024-09-25 19:06:48,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=810996.6666666666, ans=0.125 2024-09-25 19:07:02,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=811043.3333333334, ans=0.125 2024-09-25 19:07:23,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=811090.0, ans=0.95 2024-09-25 19:07:46,978 INFO [train.py:1198] (3/4) Epoch 45, batch 2400, loss[loss=0.2002, ctc_loss=0.1297, cr_loss=0.3524, over 17092.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1199, cr_loss=0.3366, over 3369066.97 frames. ], batch size: 49, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:07:47,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=811183.3333333334, ans=0.125 2024-09-25 19:07:58,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=811183.3333333334, ans=0.125 2024-09-25 19:08:00,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=811183.3333333334, ans=0.125 2024-09-25 19:08:30,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=811276.6666666666, ans=0.125 2024-09-25 19:08:49,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=811370.0, ans=0.125 2024-09-25 19:08:54,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=811370.0, ans=0.2 2024-09-25 19:08:55,593 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.054e+02 1.292e+02 1.391e+02 1.494e+02 2.070e+02, threshold=2.781e+02, percent-clipped=0.0 2024-09-25 19:09:07,046 INFO [train.py:1198] (3/4) Epoch 45, batch 2450, loss[loss=0.188, ctc_loss=0.1213, cr_loss=0.3335, over 17051.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1193, cr_loss=0.336, over 3374207.63 frames. ], batch size: 51, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:09:21,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=811463.3333333334, ans=0.0 2024-09-25 19:09:28,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.23 vs. limit=6.0 2024-09-25 19:09:57,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=811510.0, ans=0.0 2024-09-25 19:10:31,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=12.0 2024-09-25 19:10:32,452 INFO [train.py:1198] (3/4) Epoch 45, batch 2500, loss[loss=0.1734, ctc_loss=0.1071, cr_loss=0.3314, over 17298.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1198, cr_loss=0.3367, over 3359880.35 frames. ], batch size: 46, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:10:57,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=811696.6666666666, ans=0.125 2024-09-25 19:11:30,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=811790.0, ans=0.0 2024-09-25 19:11:46,431 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.195e+02 1.307e+02 1.405e+02 1.520e+02 3.997e+02, threshold=2.811e+02, percent-clipped=1.0 2024-09-25 19:11:56,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=22.5 2024-09-25 19:11:57,472 INFO [train.py:1198] (3/4) Epoch 45, batch 2550, loss[loss=0.1756, ctc_loss=0.113, cr_loss=0.3126, over 17330.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1191, cr_loss=0.3354, over 3364807.68 frames. ], batch size: 51, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:12:00,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=811883.3333333334, ans=0.125 2024-09-25 19:13:04,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=812070.0, ans=0.125 2024-09-25 19:13:18,003 INFO [train.py:1198] (3/4) Epoch 45, batch 2600, loss[loss=0.1955, ctc_loss=0.1277, cr_loss=0.339, over 17225.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1198, cr_loss=0.337, over 3359213.54 frames. ], batch size: 47, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:13:24,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.10 vs. limit=15.0 2024-09-25 19:13:26,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=812116.6666666666, ans=0.125 2024-09-25 19:13:34,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.78 vs. limit=10.0 2024-09-25 19:13:39,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=812163.3333333334, ans=0.0 2024-09-25 19:14:18,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-25 19:14:26,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2024-09-25 19:14:26,620 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.298e+02 1.365e+02 1.508e+02 3.211e+02, threshold=2.730e+02, percent-clipped=1.0 2024-09-25 19:14:37,608 INFO [train.py:1198] (3/4) Epoch 45, batch 2650, loss[loss=0.166, ctc_loss=0.1015, cr_loss=0.3222, over 17267.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1196, cr_loss=0.3366, over 3367507.63 frames. ], batch size: 42, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:15:13,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=812443.3333333334, ans=0.1 2024-09-25 19:15:42,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.93 vs. limit=6.0 2024-09-25 19:16:05,229 INFO [train.py:1198] (3/4) Epoch 45, batch 2700, loss[loss=0.213, ctc_loss=0.1393, cr_loss=0.3684, over 17287.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1194, cr_loss=0.3363, over 3376057.98 frames. ], batch size: 51, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:16:05,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=12.0 2024-09-25 19:16:11,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=812583.3333333334, ans=0.1 2024-09-25 19:16:25,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=812630.0, ans=0.0 2024-09-25 19:16:35,691 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.06 vs. limit=15.0 2024-09-25 19:16:44,892 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 19:17:07,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=812723.3333333334, ans=0.125 2024-09-25 19:17:16,507 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.332e+02 1.395e+02 1.480e+02 1.969e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-25 19:17:18,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=812770.0, ans=0.125 2024-09-25 19:17:26,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=812816.6666666666, ans=0.125 2024-09-25 19:17:27,703 INFO [train.py:1198] (3/4) Epoch 45, batch 2750, loss[loss=0.1559, ctc_loss=0.09952, cr_loss=0.2821, over 16962.00 frames. ], tot_loss[loss=0.1867, ctc_loss=0.1194, cr_loss=0.3363, over 3374175.90 frames. ], batch size: 42, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:17:29,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=812816.6666666666, ans=0.0 2024-09-25 19:17:34,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=812816.6666666666, ans=0.025 2024-09-25 19:17:40,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=812816.6666666666, ans=0.125 2024-09-25 19:17:48,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=812863.3333333334, ans=0.0 2024-09-25 19:17:48,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=812863.3333333334, ans=0.125 2024-09-25 19:17:53,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=812863.3333333334, ans=0.125 2024-09-25 19:18:09,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=812910.0, ans=0.2 2024-09-25 19:18:47,783 INFO [train.py:1198] (3/4) Epoch 45, batch 2800, loss[loss=0.1566, ctc_loss=0.09695, cr_loss=0.2985, over 16294.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.12, cr_loss=0.3377, over 3366389.61 frames. ], batch size: 36, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:18:49,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=813050.0, ans=0.2 2024-09-25 19:19:05,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813096.6666666666, ans=0.1 2024-09-25 19:19:23,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=813143.3333333334, ans=0.2 2024-09-25 19:19:38,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=813190.0, ans=0.2 2024-09-25 19:20:01,232 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.187e+02 1.322e+02 1.415e+02 1.536e+02 2.357e+02, threshold=2.830e+02, percent-clipped=0.0 2024-09-25 19:20:06,819 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.64 vs. limit=10.0 2024-09-25 19:20:08,178 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 19:20:12,481 INFO [train.py:1198] (3/4) Epoch 45, batch 2850, loss[loss=0.2005, ctc_loss=0.1284, cr_loss=0.3606, over 16923.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1206, cr_loss=0.3389, over 3344091.81 frames. ], batch size: 58, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:20:33,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=813330.0, ans=0.04949747468305833 2024-09-25 19:20:47,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813376.6666666666, ans=0.1 2024-09-25 19:20:51,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.24 vs. limit=10.0 2024-09-25 19:20:55,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=22.5 2024-09-25 19:21:25,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2024-09-25 19:21:28,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=813470.0, ans=0.0 2024-09-25 19:21:33,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=813470.0, ans=0.025 2024-09-25 19:21:34,986 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2024-09-25 19:21:37,530 INFO [train.py:1198] (3/4) Epoch 45, batch 2900, loss[loss=0.1788, ctc_loss=0.1174, cr_loss=0.3072, over 17256.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1203, cr_loss=0.3381, over 3354433.84 frames. ], batch size: 42, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:21:49,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-09-25 19:21:52,238 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 19:22:45,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=12.0 2024-09-25 19:22:46,286 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.314e+02 1.430e+02 1.564e+02 2.364e+02, threshold=2.859e+02, percent-clipped=0.0 2024-09-25 19:22:54,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=813703.3333333334, ans=0.1 2024-09-25 19:22:57,462 INFO [train.py:1198] (3/4) Epoch 45, batch 2950, loss[loss=0.1649, ctc_loss=0.1036, cr_loss=0.3063, over 17266.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1206, cr_loss=0.3385, over 3353098.55 frames. ], batch size: 42, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:23:25,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=813796.6666666666, ans=0.07 2024-09-25 19:24:02,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-25 19:24:16,615 INFO [train.py:1198] (3/4) Epoch 45, batch 3000, loss[loss=0.2277, ctc_loss=0.1459, cr_loss=0.409, over 17025.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1205, cr_loss=0.338, over 3361920.98 frames. ], batch size: 52, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:24:16,616 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 19:24:32,370 INFO [train.py:1230] (3/4) Epoch 45, validation: loss=0.03541, ctc_loss=0.03541, cr_loss=1.054e-14, over 944034.00 frames. 2024-09-25 19:24:32,370 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 19:24:51,800 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-09-25 19:25:00,016 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2024-09-25 19:25:17,838 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=12.0 2024-09-25 19:25:20,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=814123.3333333334, ans=0.0 2024-09-25 19:25:25,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=814123.3333333334, ans=0.2 2024-09-25 19:25:44,657 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.310e+02 1.382e+02 1.500e+02 2.246e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-25 19:25:55,843 INFO [train.py:1198] (3/4) Epoch 45, batch 3050, loss[loss=0.1578, ctc_loss=0.0993, cr_loss=0.2925, over 16951.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1202, cr_loss=0.3377, over 3360946.36 frames. ], batch size: 42, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:25:56,839 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-25 19:25:57,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=814216.6666666666, ans=0.125 2024-09-25 19:26:05,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=814216.6666666666, ans=0.125 2024-09-25 19:26:10,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=814263.3333333334, ans=0.025 2024-09-25 19:26:21,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=814263.3333333334, ans=0.125 2024-09-25 19:26:33,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=814310.0, ans=0.95 2024-09-25 19:26:56,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=814403.3333333334, ans=0.125 2024-09-25 19:27:13,878 INFO [train.py:1198] (3/4) Epoch 45, batch 3100, loss[loss=0.1579, ctc_loss=0.09965, cr_loss=0.2913, over 17087.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1202, cr_loss=0.3367, over 3367984.97 frames. ], batch size: 43, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:27:22,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=814450.0, ans=0.125 2024-09-25 19:27:43,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=814496.6666666666, ans=0.125 2024-09-25 19:27:45,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=12.0 2024-09-25 19:27:55,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=814543.3333333334, ans=0.0 2024-09-25 19:28:03,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=814590.0, ans=0.125 2024-09-25 19:28:18,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.04 vs. limit=15.0 2024-09-25 19:28:23,832 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.327e+02 1.418e+02 1.524e+02 1.856e+02, threshold=2.835e+02, percent-clipped=0.0 2024-09-25 19:28:27,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=814636.6666666666, ans=0.0 2024-09-25 19:28:33,293 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=22.5 2024-09-25 19:28:36,921 INFO [train.py:1198] (3/4) Epoch 45, batch 3150, loss[loss=0.1538, ctc_loss=0.09698, cr_loss=0.284, over 17269.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1203, cr_loss=0.3366, over 3352056.87 frames. ], batch size: 44, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:28:43,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=814683.3333333334, ans=0.0 2024-09-25 19:29:03,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=814730.0, ans=0.0 2024-09-25 19:29:42,558 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=22.5 2024-09-25 19:29:55,899 INFO [train.py:1198] (3/4) Epoch 45, batch 3200, loss[loss=0.2036, ctc_loss=0.1306, cr_loss=0.3647, over 17095.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1205, cr_loss=0.337, over 3349393.20 frames. ], batch size: 49, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:30:12,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814963.3333333334, ans=0.1 2024-09-25 19:30:22,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=814963.3333333334, ans=0.125 2024-09-25 19:31:03,191 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.319e+02 1.396e+02 1.488e+02 2.041e+02, threshold=2.793e+02, percent-clipped=0.0 2024-09-25 19:31:11,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=815103.3333333334, ans=0.05 2024-09-25 19:31:14,383 INFO [train.py:1198] (3/4) Epoch 45, batch 3250, loss[loss=0.2067, ctc_loss=0.1308, cr_loss=0.3795, over 17350.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1211, cr_loss=0.3387, over 3347111.71 frames. ], batch size: 48, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:31:22,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=815150.0, ans=0.125 2024-09-25 19:32:33,040 INFO [train.py:1198] (3/4) Epoch 45, batch 3300, loss[loss=0.2084, ctc_loss=0.1375, cr_loss=0.3546, over 12243.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1209, cr_loss=0.338, over 3337290.99 frames. ], batch size: 123, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:32:39,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=815383.3333333334, ans=0.125 2024-09-25 19:33:06,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=815476.6666666666, ans=0.125 2024-09-25 19:33:06,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-09-25 19:33:08,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=12.0 2024-09-25 19:33:15,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=815476.6666666666, ans=0.125 2024-09-25 19:33:19,181 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2024-09-25 19:33:24,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=815523.3333333334, ans=0.2 2024-09-25 19:33:26,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=815523.3333333334, ans=0.025 2024-09-25 19:33:28,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=815523.3333333334, ans=0.125 2024-09-25 19:33:29,610 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 19:33:39,306 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 19:33:40,405 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.311e+02 1.407e+02 1.511e+02 1.886e+02, threshold=2.815e+02, percent-clipped=0.0 2024-09-25 19:33:50,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=815616.6666666666, ans=0.0 2024-09-25 19:33:51,419 INFO [train.py:1198] (3/4) Epoch 45, batch 3350, loss[loss=0.1871, ctc_loss=0.1183, cr_loss=0.3437, over 17167.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1204, cr_loss=0.3375, over 3351074.32 frames. ], batch size: 45, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:33:53,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815616.6666666666, ans=0.1 2024-09-25 19:34:09,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=815663.3333333334, ans=0.125 2024-09-25 19:34:15,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=22.5 2024-09-25 19:34:36,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=815710.0, ans=0.125 2024-09-25 19:34:41,250 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.06 vs. limit=15.0 2024-09-25 19:34:42,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=815756.6666666666, ans=0.125 2024-09-25 19:34:55,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.94 vs. limit=12.0 2024-09-25 19:35:05,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.70 vs. limit=22.5 2024-09-25 19:35:10,349 INFO [train.py:1198] (3/4) Epoch 45, batch 3400, loss[loss=0.1634, ctc_loss=0.104, cr_loss=0.2971, over 17110.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1204, cr_loss=0.3377, over 3349416.90 frames. ], batch size: 40, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:35:10,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=815850.0, ans=0.125 2024-09-25 19:35:20,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=815850.0, ans=0.125 2024-09-25 19:35:57,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=815990.0, ans=0.125 2024-09-25 19:36:02,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=815990.0, ans=0.125 2024-09-25 19:36:19,819 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.333e+02 1.429e+02 1.523e+02 2.294e+02, threshold=2.857e+02, percent-clipped=0.0 2024-09-25 19:36:30,599 INFO [train.py:1198] (3/4) Epoch 45, batch 3450, loss[loss=0.1566, ctc_loss=0.09888, cr_loss=0.2887, over 17128.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1205, cr_loss=0.3378, over 3346853.68 frames. ], batch size: 40, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:37:17,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=816223.3333333334, ans=0.0 2024-09-25 19:37:35,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=816270.0, ans=0.125 2024-09-25 19:37:50,542 INFO [train.py:1198] (3/4) Epoch 45, batch 3500, loss[loss=0.1618, ctc_loss=0.09944, cr_loss=0.3119, over 16700.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1199, cr_loss=0.3365, over 3340168.36 frames. ], batch size: 37, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:37:50,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=816316.6666666666, ans=0.125 2024-09-25 19:38:23,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=816410.0, ans=0.0 2024-09-25 19:38:34,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=816410.0, ans=0.1 2024-09-25 19:38:37,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=816456.6666666666, ans=0.1 2024-09-25 19:38:39,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=816456.6666666666, ans=0.1 2024-09-25 19:38:50,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=816456.6666666666, ans=0.2 2024-09-25 19:38:58,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=816503.3333333334, ans=0.125 2024-09-25 19:39:01,034 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.307e+02 1.374e+02 1.461e+02 1.986e+02, threshold=2.748e+02, percent-clipped=0.0 2024-09-25 19:39:09,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=816550.0, ans=0.1 2024-09-25 19:39:10,424 INFO [train.py:1198] (3/4) Epoch 45, batch 3550, loss[loss=0.1837, ctc_loss=0.1183, cr_loss=0.3271, over 17150.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.12, cr_loss=0.3364, over 3342481.33 frames. ], batch size: 48, lr: 2.62e-03, grad_scale: 16.0 2024-09-25 19:39:20,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=816550.0, ans=0.125 2024-09-25 19:39:59,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=816690.0, ans=0.125 2024-09-25 19:40:04,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=22.5 2024-09-25 19:40:25,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=816736.6666666666, ans=0.1 2024-09-25 19:40:28,819 INFO [train.py:1198] (3/4) Epoch 45, batch 3600, loss[loss=0.1744, ctc_loss=0.1095, cr_loss=0.3248, over 17225.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1199, cr_loss=0.3365, over 3341183.20 frames. ], batch size: 50, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:40:30,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=816783.3333333334, ans=0.125 2024-09-25 19:40:47,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=816830.0, ans=0.0 2024-09-25 19:41:09,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=816876.6666666666, ans=0.125 2024-09-25 19:41:14,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=816923.3333333334, ans=0.2 2024-09-25 19:41:37,495 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.193e+02 1.298e+02 1.374e+02 1.488e+02 2.136e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-25 19:41:37,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=816970.0, ans=0.0 2024-09-25 19:41:46,856 INFO [train.py:1198] (3/4) Epoch 45, batch 3650, loss[loss=0.1615, ctc_loss=0.1003, cr_loss=0.3062, over 17290.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1207, cr_loss=0.3369, over 3312239.26 frames. ], batch size: 46, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:41:53,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=817016.6666666666, ans=0.125 2024-09-25 19:41:58,558 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=22.5 2024-09-25 19:42:48,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=817203.3333333334, ans=0.125 2024-09-25 19:43:01,205 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 19:43:03,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.46 vs. limit=15.0 2024-09-25 19:43:04,608 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2024-09-25 19:43:05,644 INFO [train.py:1198] (3/4) Epoch 45, batch 3700, loss[loss=0.1695, ctc_loss=0.1054, cr_loss=0.3204, over 16956.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1202, cr_loss=0.3359, over 3325467.16 frames. ], batch size: 42, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:43:14,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2024-09-25 19:43:16,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=817250.0, ans=0.125 2024-09-25 19:43:36,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.70 vs. limit=10.0 2024-09-25 19:43:46,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=22.5 2024-09-25 19:44:01,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.82 vs. limit=10.0 2024-09-25 19:44:14,699 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.280e+02 1.340e+02 1.437e+02 1.807e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-25 19:44:17,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.74 vs. limit=22.5 2024-09-25 19:44:24,091 INFO [train.py:1198] (3/4) Epoch 45, batch 3750, loss[loss=0.1519, ctc_loss=0.09587, cr_loss=0.28, over 17041.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1207, cr_loss=0.3368, over 3330091.10 frames. ], batch size: 39, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:44:27,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=817483.3333333334, ans=0.125 2024-09-25 19:44:51,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=817530.0, ans=0.125 2024-09-25 19:45:03,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2024-09-25 19:45:31,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=817670.0, ans=0.125 2024-09-25 19:45:31,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=817670.0, ans=0.1 2024-09-25 19:45:34,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2024-09-25 19:45:43,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=817716.6666666666, ans=0.125 2024-09-25 19:45:44,858 INFO [train.py:1198] (3/4) Epoch 45, batch 3800, loss[loss=0.2056, ctc_loss=0.1344, cr_loss=0.3561, over 17001.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1217, cr_loss=0.3382, over 3326125.36 frames. ], batch size: 56, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:45:45,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=817716.6666666666, ans=0.07 2024-09-25 19:46:23,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=817810.0, ans=0.1 2024-09-25 19:46:27,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=817810.0, ans=0.125 2024-09-25 19:46:36,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.99 vs. limit=15.0 2024-09-25 19:46:54,350 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.363e+02 1.462e+02 1.575e+02 2.339e+02, threshold=2.925e+02, percent-clipped=0.0 2024-09-25 19:46:56,824 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2024-09-25 19:47:03,900 INFO [train.py:1198] (3/4) Epoch 45, batch 3850, loss[loss=0.2058, ctc_loss=0.1375, cr_loss=0.3415, over 12153.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1225, cr_loss=0.3388, over 3282372.52 frames. ], batch size: 123, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:47:23,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=817996.6666666666, ans=0.0 2024-09-25 19:47:55,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=818090.0, ans=0.2 2024-09-25 19:48:04,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=818090.0, ans=0.125 2024-09-25 19:48:58,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=818164.6666666666, ans=0.0 2024-09-25 19:48:59,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.60 vs. limit=15.0 2024-09-25 19:49:07,326 INFO [train.py:1198] (3/4) Epoch 46, batch 0, loss[loss=0.2264, ctc_loss=0.1442, cr_loss=0.4113, over 16979.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1442, cr_loss=0.4113, over 16979.00 frames. ], batch size: 53, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 19:49:07,327 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 19:49:20,484 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.9946, 4.3467, 3.8835, 4.1031], device='cuda:3') 2024-09-25 19:49:22,408 INFO [train.py:1230] (3/4) Epoch 46, validation: loss=0.03502, ctc_loss=0.03502, cr_loss=1.054e-14, over 944034.00 frames. 2024-09-25 19:49:22,409 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 19:49:30,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=818164.6666666666, ans=0.0 2024-09-25 19:49:35,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=818164.6666666666, ans=0.1 2024-09-25 19:49:40,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=818211.3333333334, ans=0.125 2024-09-25 19:49:51,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=818211.3333333334, ans=0.1 2024-09-25 19:49:58,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=818258.0, ans=0.1 2024-09-25 19:50:04,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=818258.0, ans=0.025 2024-09-25 19:50:14,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.18 vs. limit=15.0 2024-09-25 19:50:17,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=818304.6666666666, ans=0.2 2024-09-25 19:50:40,885 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.337e+02 1.469e+02 1.679e+02 2.405e+02, threshold=2.939e+02, percent-clipped=0.0 2024-09-25 19:50:44,078 INFO [train.py:1198] (3/4) Epoch 46, batch 50, loss[loss=0.186, ctc_loss=0.1191, cr_loss=0.3342, over 17005.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1204, cr_loss=0.3389, over 752992.61 frames. ], batch size: 51, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 19:51:30,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=818491.3333333334, ans=0.125 2024-09-25 19:51:32,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=818491.3333333334, ans=0.0 2024-09-25 19:51:50,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=15.0 2024-09-25 19:51:53,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=818584.6666666666, ans=0.0 2024-09-25 19:52:00,250 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2024-09-25 19:52:08,992 INFO [train.py:1198] (3/4) Epoch 46, batch 100, loss[loss=0.1719, ctc_loss=0.1092, cr_loss=0.3135, over 17086.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1195, cr_loss=0.3364, over 1325531.76 frames. ], batch size: 43, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 19:52:27,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=818678.0, ans=0.0 2024-09-25 19:52:33,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=818678.0, ans=0.0 2024-09-25 19:52:38,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=818678.0, ans=0.125 2024-09-25 19:53:03,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=818771.3333333334, ans=0.025 2024-09-25 19:53:22,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=818818.0, ans=0.125 2024-09-25 19:53:28,956 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.282e+02 1.339e+02 1.432e+02 3.555e+02, threshold=2.678e+02, percent-clipped=2.0 2024-09-25 19:53:29,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=818818.0, ans=0.2 2024-09-25 19:53:29,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=15.0 2024-09-25 19:53:32,248 INFO [train.py:1198] (3/4) Epoch 46, batch 150, loss[loss=0.1794, ctc_loss=0.1137, cr_loss=0.3284, over 17112.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1185, cr_loss=0.3345, over 1781274.13 frames. ], batch size: 49, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 19:54:00,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=818911.3333333334, ans=0.025 2024-09-25 19:54:08,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=818958.0, ans=0.1 2024-09-25 19:54:11,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=818958.0, ans=0.125 2024-09-25 19:54:42,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=819051.3333333334, ans=0.125 2024-09-25 19:54:51,562 INFO [train.py:1198] (3/4) Epoch 46, batch 200, loss[loss=0.2118, ctc_loss=0.1362, cr_loss=0.3783, over 16775.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1186, cr_loss=0.3345, over 2133408.65 frames. ], batch size: 61, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 19:55:06,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=819144.6666666666, ans=0.0 2024-09-25 19:55:09,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=819144.6666666666, ans=0.0 2024-09-25 19:55:20,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=819144.6666666666, ans=0.125 2024-09-25 19:55:23,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=819191.3333333334, ans=0.125 2024-09-25 19:55:52,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=819238.0, ans=0.0 2024-09-25 19:56:13,858 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.302e+02 1.353e+02 1.428e+02 2.054e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-25 19:56:17,271 INFO [train.py:1198] (3/4) Epoch 46, batch 250, loss[loss=0.162, ctc_loss=0.1035, cr_loss=0.2925, over 17049.00 frames. ], tot_loss[loss=0.1844, ctc_loss=0.1178, cr_loss=0.333, over 2415906.68 frames. ], batch size: 39, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 19:56:17,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=819331.3333333334, ans=0.125 2024-09-25 19:56:39,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.14 vs. limit=15.0 2024-09-25 19:56:40,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=819378.0, ans=0.0 2024-09-25 19:56:46,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=819378.0, ans=0.125 2024-09-25 19:56:54,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=819424.6666666666, ans=0.2 2024-09-25 19:56:56,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.36 vs. limit=6.0 2024-09-25 19:57:15,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=819471.3333333334, ans=0.125 2024-09-25 19:57:23,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=819518.0, ans=0.0 2024-09-25 19:57:26,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=819518.0, ans=0.125 2024-09-25 19:57:29,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=819518.0, ans=0.025 2024-09-25 19:57:33,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=819518.0, ans=0.025 2024-09-25 19:57:33,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=12.0 2024-09-25 19:57:40,842 INFO [train.py:1198] (3/4) Epoch 46, batch 300, loss[loss=0.2235, ctc_loss=0.1503, cr_loss=0.366, over 11690.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1189, cr_loss=0.3348, over 2624889.82 frames. ], batch size: 123, lr: 2.59e-03, grad_scale: 16.0 2024-09-25 19:57:57,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=819611.3333333334, ans=0.0 2024-09-25 19:58:08,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=819611.3333333334, ans=0.025 2024-09-25 19:58:08,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=819611.3333333334, ans=0.1 2024-09-25 19:58:25,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=819658.0, ans=0.125 2024-09-25 19:58:43,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=819704.6666666666, ans=0.125 2024-09-25 19:59:02,222 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.300e+02 1.374e+02 1.446e+02 2.011e+02, threshold=2.748e+02, percent-clipped=0.0 2024-09-25 19:59:03,754 INFO [train.py:1198] (3/4) Epoch 46, batch 350, loss[loss=0.1904, ctc_loss=0.1212, cr_loss=0.3461, over 17073.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1187, cr_loss=0.3344, over 2792947.83 frames. ], batch size: 52, lr: 2.59e-03, grad_scale: 16.0 2024-09-25 19:59:05,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=819798.0, ans=0.125 2024-09-25 19:59:13,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=819798.0, ans=0.5 2024-09-25 19:59:37,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=819891.3333333334, ans=0.05 2024-09-25 19:59:38,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.72 vs. limit=15.0 2024-09-25 19:59:53,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=819938.0, ans=0.0 2024-09-25 20:00:02,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=819938.0, ans=0.125 2024-09-25 20:00:23,197 INFO [train.py:1198] (3/4) Epoch 46, batch 400, loss[loss=0.1829, ctc_loss=0.1175, cr_loss=0.327, over 17227.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1189, cr_loss=0.3348, over 2924938.14 frames. ], batch size: 50, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:00:37,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=820031.3333333334, ans=0.025 2024-09-25 20:00:40,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=820078.0, ans=0.2 2024-09-25 20:01:01,874 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 20:01:27,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-09-25 20:01:50,263 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.300e+02 1.386e+02 1.492e+02 1.969e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 20:01:52,001 INFO [train.py:1198] (3/4) Epoch 46, batch 450, loss[loss=0.1782, ctc_loss=0.1137, cr_loss=0.3227, over 17088.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1195, cr_loss=0.3355, over 3008636.47 frames. ], batch size: 49, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:02:10,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=820311.3333333334, ans=0.125 2024-09-25 20:02:13,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2024-09-25 20:02:30,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=820358.0, ans=0.125 2024-09-25 20:02:35,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=820358.0, ans=0.0 2024-09-25 20:02:48,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=820404.6666666666, ans=0.0 2024-09-25 20:02:51,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=820404.6666666666, ans=0.025 2024-09-25 20:03:11,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=820451.3333333334, ans=0.0 2024-09-25 20:03:14,980 INFO [train.py:1198] (3/4) Epoch 46, batch 500, loss[loss=0.1789, ctc_loss=0.1097, cr_loss=0.3461, over 17013.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1198, cr_loss=0.3362, over 3083394.89 frames. ], batch size: 44, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:03:16,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=820498.0, ans=0.125 2024-09-25 20:03:34,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=820544.6666666666, ans=0.125 2024-09-25 20:03:37,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=820544.6666666666, ans=0.125 2024-09-25 20:03:58,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=820591.3333333334, ans=0.125 2024-09-25 20:04:00,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=820591.3333333334, ans=0.2 2024-09-25 20:04:02,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=820638.0, ans=0.125 2024-09-25 20:04:03,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=820638.0, ans=0.0 2024-09-25 20:04:05,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=820638.0, ans=0.025 2024-09-25 20:04:11,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=820638.0, ans=0.0 2024-09-25 20:04:33,508 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.017e+02 1.336e+02 1.423e+02 1.544e+02 3.291e+02, threshold=2.846e+02, percent-clipped=2.0 2024-09-25 20:04:35,182 INFO [train.py:1198] (3/4) Epoch 46, batch 550, loss[loss=0.1969, ctc_loss=0.1276, cr_loss=0.3466, over 17055.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1204, cr_loss=0.3366, over 3143567.05 frames. ], batch size: 46, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:04:37,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820731.3333333334, ans=0.1 2024-09-25 20:04:53,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=820778.0, ans=0.125 2024-09-25 20:04:53,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2024-09-25 20:05:22,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=820824.6666666666, ans=0.0 2024-09-25 20:05:56,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2024-09-25 20:06:00,563 INFO [train.py:1198] (3/4) Epoch 46, batch 600, loss[loss=0.1837, ctc_loss=0.1188, cr_loss=0.3244, over 16952.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1197, cr_loss=0.3348, over 3184805.71 frames. ], batch size: 58, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:06:04,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=820964.6666666666, ans=0.125 2024-09-25 20:06:08,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=820964.6666666666, ans=0.125 2024-09-25 20:06:08,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=820964.6666666666, ans=0.04949747468305833 2024-09-25 20:06:29,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=12.0 2024-09-25 20:07:21,441 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.282e+02 1.361e+02 1.501e+02 2.285e+02, threshold=2.723e+02, percent-clipped=0.0 2024-09-25 20:07:21,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=821198.0, ans=0.2 2024-09-25 20:07:22,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=12.0 2024-09-25 20:07:23,101 INFO [train.py:1198] (3/4) Epoch 46, batch 650, loss[loss=0.1962, ctc_loss=0.1274, cr_loss=0.3439, over 17009.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.12, cr_loss=0.3363, over 3218759.04 frames. ], batch size: 53, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:07:41,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=821244.6666666666, ans=0.0 2024-09-25 20:07:44,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=821244.6666666666, ans=0.125 2024-09-25 20:07:59,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=821291.3333333334, ans=0.125 2024-09-25 20:08:01,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=821291.3333333334, ans=0.0 2024-09-25 20:08:01,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=821291.3333333334, ans=0.04949747468305833 2024-09-25 20:08:06,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=821291.3333333334, ans=0.125 2024-09-25 20:08:48,496 INFO [train.py:1198] (3/4) Epoch 46, batch 700, loss[loss=0.1631, ctc_loss=0.1018, cr_loss=0.3065, over 16931.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1188, cr_loss=0.3339, over 3252073.61 frames. ], batch size: 42, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:08:55,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=821431.3333333334, ans=0.125 2024-09-25 20:08:56,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=821431.3333333334, ans=0.0 2024-09-25 20:09:01,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=821431.3333333334, ans=0.1 2024-09-25 20:09:03,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=821478.0, ans=0.125 2024-09-25 20:09:06,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=821478.0, ans=0.125 2024-09-25 20:09:06,607 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 20:09:23,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=821524.6666666666, ans=0.125 2024-09-25 20:09:36,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=821571.3333333334, ans=0.1 2024-09-25 20:09:46,871 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=15.0 2024-09-25 20:10:06,894 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.305e+02 1.403e+02 1.487e+02 1.713e+02, threshold=2.806e+02, percent-clipped=0.0 2024-09-25 20:10:08,543 INFO [train.py:1198] (3/4) Epoch 46, batch 750, loss[loss=0.1901, ctc_loss=0.1223, cr_loss=0.339, over 17226.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1195, cr_loss=0.335, over 3272577.07 frames. ], batch size: 50, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:10:16,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=821664.6666666666, ans=0.125 2024-09-25 20:10:23,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.12 vs. limit=12.0 2024-09-25 20:10:40,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=821711.3333333334, ans=0.0 2024-09-25 20:10:49,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=821758.0, ans=0.025 2024-09-25 20:10:51,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=821758.0, ans=0.2 2024-09-25 20:11:07,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=22.5 2024-09-25 20:11:36,192 INFO [train.py:1198] (3/4) Epoch 46, batch 800, loss[loss=0.1409, ctc_loss=0.08759, cr_loss=0.2666, over 17261.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1194, cr_loss=0.3347, over 3292015.71 frames. ], batch size: 42, lr: 2.58e-03, grad_scale: 32.0 2024-09-25 20:12:23,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2024-09-25 20:12:47,466 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2024-09-25 20:12:57,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=822131.3333333334, ans=0.025 2024-09-25 20:12:58,584 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.283e+02 1.367e+02 1.457e+02 2.182e+02, threshold=2.735e+02, percent-clipped=0.0 2024-09-25 20:12:58,609 INFO [train.py:1198] (3/4) Epoch 46, batch 850, loss[loss=0.1914, ctc_loss=0.1178, cr_loss=0.3681, over 17208.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1195, cr_loss=0.3352, over 3304685.28 frames. ], batch size: 47, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:13:00,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822131.3333333334, ans=0.1 2024-09-25 20:13:12,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=22.5 2024-09-25 20:13:40,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=822224.6666666666, ans=0.0 2024-09-25 20:14:02,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=822318.0, ans=0.125 2024-09-25 20:14:18,697 INFO [train.py:1198] (3/4) Epoch 46, batch 900, loss[loss=0.2015, ctc_loss=0.1311, cr_loss=0.3515, over 16686.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.119, cr_loss=0.3353, over 3326806.82 frames. ], batch size: 61, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:14:49,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=822458.0, ans=0.125 2024-09-25 20:14:55,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=822458.0, ans=0.125 2024-09-25 20:15:06,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=822504.6666666666, ans=0.125 2024-09-25 20:15:41,543 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.288e+02 1.364e+02 1.452e+02 2.497e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-25 20:15:41,587 INFO [train.py:1198] (3/4) Epoch 46, batch 950, loss[loss=0.1871, ctc_loss=0.1164, cr_loss=0.3533, over 17010.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1196, cr_loss=0.3361, over 3326669.62 frames. ], batch size: 51, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:15:43,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822598.0, ans=0.1 2024-09-25 20:15:45,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=822598.0, ans=0.2 2024-09-25 20:16:35,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=822738.0, ans=0.125 2024-09-25 20:16:37,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822738.0, ans=0.1 2024-09-25 20:16:43,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=822738.0, ans=0.125 2024-09-25 20:16:48,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-25 20:17:07,492 INFO [train.py:1198] (3/4) Epoch 46, batch 1000, loss[loss=0.2278, ctc_loss=0.154, cr_loss=0.369, over 15118.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1205, cr_loss=0.3379, over 3328448.88 frames. ], batch size: 89, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:17:09,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=822831.3333333334, ans=0.0 2024-09-25 20:17:10,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-25 20:17:38,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=822924.6666666666, ans=0.05 2024-09-25 20:18:30,595 INFO [train.py:1198] (3/4) Epoch 46, batch 1050, loss[loss=0.151, ctc_loss=0.09459, cr_loss=0.2822, over 17284.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1203, cr_loss=0.3372, over 3330643.46 frames. ], batch size: 42, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:18:32,116 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.054e+02 1.303e+02 1.373e+02 1.497e+02 1.983e+02, threshold=2.746e+02, percent-clipped=0.0 2024-09-25 20:18:40,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=823064.6666666666, ans=0.1 2024-09-25 20:18:40,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=823064.6666666666, ans=0.0 2024-09-25 20:18:57,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=823111.3333333334, ans=0.0 2024-09-25 20:19:10,716 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 20:19:26,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=823204.6666666666, ans=0.125 2024-09-25 20:19:34,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=823251.3333333334, ans=0.125 2024-09-25 20:19:41,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=823251.3333333334, ans=0.1 2024-09-25 20:19:42,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=823251.3333333334, ans=0.07 2024-09-25 20:19:47,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=823251.3333333334, ans=0.0 2024-09-25 20:19:50,554 INFO [train.py:1198] (3/4) Epoch 46, batch 1100, loss[loss=0.2157, ctc_loss=0.1404, cr_loss=0.3764, over 16748.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1196, cr_loss=0.3363, over 3345880.41 frames. ], batch size: 61, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:19:58,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=823298.0, ans=0.125 2024-09-25 20:20:07,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2024-09-25 20:20:08,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=823344.6666666666, ans=0.125 2024-09-25 20:20:23,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=823391.3333333334, ans=0.2 2024-09-25 20:20:38,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=823391.3333333334, ans=0.025 2024-09-25 20:20:44,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=823438.0, ans=0.025 2024-09-25 20:21:15,713 INFO [train.py:1198] (3/4) Epoch 46, batch 1150, loss[loss=0.2287, ctc_loss=0.1444, cr_loss=0.4217, over 17204.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1193, cr_loss=0.3362, over 3350762.06 frames. ], batch size: 55, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:21:16,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=823531.3333333334, ans=0.2 2024-09-25 20:21:17,339 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.324e+02 1.372e+02 1.463e+02 5.655e+02, threshold=2.745e+02, percent-clipped=1.0 2024-09-25 20:21:47,937 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2024-09-25 20:21:50,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=823624.6666666666, ans=0.125 2024-09-25 20:22:16,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=823671.3333333334, ans=0.125 2024-09-25 20:22:17,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.24 vs. limit=15.0 2024-09-25 20:22:33,123 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=22.5 2024-09-25 20:22:38,928 INFO [train.py:1198] (3/4) Epoch 46, batch 1200, loss[loss=0.1647, ctc_loss=0.1035, cr_loss=0.3064, over 17087.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1191, cr_loss=0.3352, over 3346307.33 frames. ], batch size: 39, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:23:17,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.43 vs. limit=10.0 2024-09-25 20:23:31,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=823904.6666666666, ans=0.025 2024-09-25 20:23:46,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=823951.3333333334, ans=0.0 2024-09-25 20:24:00,929 INFO [train.py:1198] (3/4) Epoch 46, batch 1250, loss[loss=0.1819, ctc_loss=0.1142, cr_loss=0.3386, over 17300.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1195, cr_loss=0.3362, over 3351760.88 frames. ], batch size: 46, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:24:02,523 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.285e+02 1.375e+02 1.486e+02 1.915e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-25 20:24:06,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=823998.0, ans=0.2 2024-09-25 20:24:06,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=823998.0, ans=0.2 2024-09-25 20:25:23,680 INFO [train.py:1198] (3/4) Epoch 46, batch 1300, loss[loss=0.2163, ctc_loss=0.143, cr_loss=0.3664, over 14735.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1193, cr_loss=0.3361, over 3362312.28 frames. ], batch size: 89, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:25:39,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=824278.0, ans=0.125 2024-09-25 20:25:49,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=824278.0, ans=0.0 2024-09-25 20:26:05,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=12.0 2024-09-25 20:26:14,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=824371.3333333334, ans=0.125 2024-09-25 20:26:28,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=824371.3333333334, ans=0.2 2024-09-25 20:26:29,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=824371.3333333334, ans=0.125 2024-09-25 20:26:48,557 INFO [train.py:1198] (3/4) Epoch 46, batch 1350, loss[loss=0.2145, ctc_loss=0.1395, cr_loss=0.3749, over 16998.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1194, cr_loss=0.3361, over 3366959.03 frames. ], batch size: 53, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:26:51,683 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.295e+02 1.386e+02 1.472e+02 1.767e+02, threshold=2.771e+02, percent-clipped=0.0 2024-09-25 20:26:52,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=824464.6666666666, ans=0.1 2024-09-25 20:27:20,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=824558.0, ans=0.125 2024-09-25 20:27:45,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=824604.6666666666, ans=0.125 2024-09-25 20:27:58,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824651.3333333334, ans=0.1 2024-09-25 20:28:10,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=824698.0, ans=0.2 2024-09-25 20:28:11,416 INFO [train.py:1198] (3/4) Epoch 46, batch 1400, loss[loss=0.1769, ctc_loss=0.1128, cr_loss=0.3205, over 17241.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1199, cr_loss=0.3366, over 3351800.19 frames. ], batch size: 42, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:28:18,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.55 vs. limit=15.0 2024-09-25 20:28:47,206 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 20:28:58,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824838.0, ans=0.1 2024-09-25 20:29:08,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=824838.0, ans=0.0 2024-09-25 20:29:20,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=824884.6666666666, ans=0.125 2024-09-25 20:29:31,894 INFO [train.py:1198] (3/4) Epoch 46, batch 1450, loss[loss=0.2026, ctc_loss=0.1298, cr_loss=0.3643, over 17135.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1198, cr_loss=0.3366, over 3362514.70 frames. ], batch size: 48, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:29:35,105 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.196e+02 1.313e+02 1.376e+02 1.492e+02 1.838e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-25 20:29:38,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=824931.3333333334, ans=0.2 2024-09-25 20:29:51,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=824978.0, ans=0.2 2024-09-25 20:30:20,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=825024.6666666666, ans=0.125 2024-09-25 20:30:21,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=825071.3333333334, ans=0.0 2024-09-25 20:30:28,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=825071.3333333334, ans=0.125 2024-09-25 20:30:34,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=825071.3333333334, ans=0.2 2024-09-25 20:30:45,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=825118.0, ans=0.125 2024-09-25 20:30:47,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=825118.0, ans=0.1 2024-09-25 20:30:54,860 INFO [train.py:1198] (3/4) Epoch 46, batch 1500, loss[loss=0.1678, ctc_loss=0.1057, cr_loss=0.3107, over 17017.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1202, cr_loss=0.3376, over 3363662.77 frames. ], batch size: 44, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:31:37,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=825258.0, ans=0.0 2024-09-25 20:31:53,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=825304.6666666666, ans=0.125 2024-09-25 20:31:54,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=825304.6666666666, ans=0.125 2024-09-25 20:31:58,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=825304.6666666666, ans=0.95 2024-09-25 20:32:18,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.63 vs. limit=15.0 2024-09-25 20:32:20,345 INFO [train.py:1198] (3/4) Epoch 46, batch 1550, loss[loss=0.1778, ctc_loss=0.113, cr_loss=0.3241, over 16933.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1196, cr_loss=0.3367, over 3370416.11 frames. ], batch size: 42, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:32:22,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=825398.0, ans=0.025 2024-09-25 20:32:23,493 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 1.290e+02 1.399e+02 1.518e+02 4.516e+02, threshold=2.798e+02, percent-clipped=1.0 2024-09-25 20:32:58,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=825491.3333333334, ans=0.1 2024-09-25 20:33:03,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=825491.3333333334, ans=0.125 2024-09-25 20:33:27,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=825584.6666666666, ans=0.1 2024-09-25 20:33:29,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=825584.6666666666, ans=0.025 2024-09-25 20:33:43,586 INFO [train.py:1198] (3/4) Epoch 46, batch 1600, loss[loss=0.1952, ctc_loss=0.1239, cr_loss=0.3563, over 17098.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1192, cr_loss=0.336, over 3374468.46 frames. ], batch size: 49, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:34:38,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=825771.3333333334, ans=0.1 2024-09-25 20:34:49,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=825818.0, ans=0.125 2024-09-25 20:35:04,126 INFO [train.py:1198] (3/4) Epoch 46, batch 1650, loss[loss=0.2248, ctc_loss=0.1499, cr_loss=0.3745, over 11620.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1197, cr_loss=0.337, over 3358829.74 frames. ], batch size: 123, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:35:07,360 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.199e+02 1.315e+02 1.405e+02 1.548e+02 2.408e+02, threshold=2.810e+02, percent-clipped=0.0 2024-09-25 20:35:07,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=825864.6666666666, ans=0.0 2024-09-25 20:35:14,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=22.5 2024-09-25 20:35:54,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=826004.6666666666, ans=0.125 2024-09-25 20:35:56,040 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 20:36:06,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=826004.6666666666, ans=0.2 2024-09-25 20:36:17,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=826051.3333333334, ans=0.0 2024-09-25 20:36:32,699 INFO [train.py:1198] (3/4) Epoch 46, batch 1700, loss[loss=0.1792, ctc_loss=0.1148, cr_loss=0.3219, over 16694.00 frames. ], tot_loss[loss=0.1864, ctc_loss=0.1192, cr_loss=0.336, over 3369004.98 frames. ], batch size: 37, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:36:58,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=826144.6666666666, ans=0.2 2024-09-25 20:37:25,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=826238.0, ans=0.025 2024-09-25 20:37:52,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.20 vs. limit=6.0 2024-09-25 20:37:55,311 INFO [train.py:1198] (3/4) Epoch 46, batch 1750, loss[loss=0.1861, ctc_loss=0.1169, cr_loss=0.346, over 17350.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1185, cr_loss=0.3336, over 3362965.26 frames. ], batch size: 52, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:38:00,307 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.299e+02 1.380e+02 1.488e+02 2.427e+02, threshold=2.759e+02, percent-clipped=0.0 2024-09-25 20:38:12,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=826378.0, ans=0.1 2024-09-25 20:38:21,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=826378.0, ans=0.0 2024-09-25 20:38:42,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=826471.3333333334, ans=0.125 2024-09-25 20:38:45,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=826471.3333333334, ans=0.0 2024-09-25 20:39:02,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=826518.0, ans=0.125 2024-09-25 20:39:07,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=826518.0, ans=0.125 2024-09-25 20:39:09,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=826518.0, ans=0.1 2024-09-25 20:39:12,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=826518.0, ans=0.125 2024-09-25 20:39:15,276 INFO [train.py:1198] (3/4) Epoch 46, batch 1800, loss[loss=0.1672, ctc_loss=0.1051, cr_loss=0.3108, over 16939.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1183, cr_loss=0.3337, over 3361625.80 frames. ], batch size: 42, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:39:22,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=826564.6666666666, ans=0.125 2024-09-25 20:39:30,057 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 20:39:33,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=826611.3333333334, ans=0.0 2024-09-25 20:40:21,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=826751.3333333334, ans=0.015 2024-09-25 20:40:29,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=826751.3333333334, ans=6.0 2024-09-25 20:40:38,773 INFO [train.py:1198] (3/4) Epoch 46, batch 1850, loss[loss=0.1603, ctc_loss=0.1003, cr_loss=0.3002, over 17206.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1185, cr_loss=0.3338, over 3361126.64 frames. ], batch size: 41, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:40:43,546 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.312e+02 1.384e+02 1.504e+02 2.324e+02, threshold=2.767e+02, percent-clipped=0.0 2024-09-25 20:40:59,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=826844.6666666666, ans=0.2 2024-09-25 20:41:21,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=826891.3333333334, ans=0.0 2024-09-25 20:42:03,935 INFO [train.py:1198] (3/4) Epoch 46, batch 1900, loss[loss=0.1813, ctc_loss=0.118, cr_loss=0.3165, over 15784.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1193, cr_loss=0.336, over 3366453.55 frames. ], batch size: 74, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:42:21,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=827078.0, ans=0.1 2024-09-25 20:42:24,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=827078.0, ans=0.07 2024-09-25 20:42:47,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=827124.6666666666, ans=0.0 2024-09-25 20:43:26,927 INFO [train.py:1198] (3/4) Epoch 46, batch 1950, loss[loss=0.1938, ctc_loss=0.122, cr_loss=0.3594, over 17005.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1181, cr_loss=0.3342, over 3376761.92 frames. ], batch size: 53, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:43:31,734 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.295e+02 1.354e+02 1.509e+02 3.056e+02, threshold=2.707e+02, percent-clipped=1.0 2024-09-25 20:43:33,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=827264.6666666666, ans=0.2 2024-09-25 20:43:39,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=827264.6666666666, ans=0.1 2024-09-25 20:43:46,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=827311.3333333334, ans=0.125 2024-09-25 20:43:56,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2024-09-25 20:43:57,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=12.0 2024-09-25 20:44:06,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=827358.0, ans=0.0 2024-09-25 20:44:13,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=827404.6666666666, ans=0.1 2024-09-25 20:44:28,221 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.96 vs. limit=6.0 2024-09-25 20:44:29,894 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.69 vs. limit=22.5 2024-09-25 20:44:46,533 INFO [train.py:1198] (3/4) Epoch 46, batch 2000, loss[loss=0.1624, ctc_loss=0.1019, cr_loss=0.3026, over 17073.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1186, cr_loss=0.3346, over 3372705.32 frames. ], batch size: 40, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:45:19,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=827591.3333333334, ans=0.125 2024-09-25 20:45:44,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=827638.0, ans=0.125 2024-09-25 20:45:49,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=827638.0, ans=0.125 2024-09-25 20:45:49,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=827638.0, ans=10.0 2024-09-25 20:45:59,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=827684.6666666666, ans=10.0 2024-09-25 20:46:11,126 INFO [train.py:1198] (3/4) Epoch 46, batch 2050, loss[loss=0.166, ctc_loss=0.1072, cr_loss=0.2938, over 17263.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1184, cr_loss=0.3338, over 3378570.16 frames. ], batch size: 44, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:46:18,715 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.306e+02 1.402e+02 1.494e+02 2.074e+02, threshold=2.805e+02, percent-clipped=0.0 2024-09-25 20:46:22,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.18 vs. limit=10.0 2024-09-25 20:46:31,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=827778.0, ans=0.0 2024-09-25 20:46:41,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=827778.0, ans=0.125 2024-09-25 20:46:53,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=827824.6666666666, ans=0.125 2024-09-25 20:47:07,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=827871.3333333334, ans=0.05 2024-09-25 20:47:12,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=827871.3333333334, ans=0.125 2024-09-25 20:47:12,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=827871.3333333334, ans=0.2 2024-09-25 20:47:29,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=827918.0, ans=0.125 2024-09-25 20:47:34,394 INFO [train.py:1198] (3/4) Epoch 46, batch 2100, loss[loss=0.1779, ctc_loss=0.1141, cr_loss=0.319, over 17366.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1187, cr_loss=0.3345, over 3365034.01 frames. ], batch size: 48, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:47:46,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=827964.6666666666, ans=0.125 2024-09-25 20:48:01,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=828011.3333333334, ans=0.125 2024-09-25 20:48:16,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=828058.0, ans=0.125 2024-09-25 20:48:33,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=828104.6666666666, ans=0.125 2024-09-25 20:48:56,846 INFO [train.py:1198] (3/4) Epoch 46, batch 2150, loss[loss=0.1982, ctc_loss=0.1288, cr_loss=0.3469, over 17234.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1184, cr_loss=0.3341, over 3364235.23 frames. ], batch size: 50, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 20:49:01,529 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.174e+02 1.324e+02 1.399e+02 1.517e+02 1.900e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-25 20:49:09,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=828198.0, ans=0.125 2024-09-25 20:49:11,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=828244.6666666666, ans=0.125 2024-09-25 20:49:12,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=828244.6666666666, ans=0.125 2024-09-25 20:49:15,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.44 vs. limit=12.0 2024-09-25 20:49:21,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=828244.6666666666, ans=0.125 2024-09-25 20:49:21,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=828244.6666666666, ans=0.125 2024-09-25 20:49:35,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=828291.3333333334, ans=0.125 2024-09-25 20:49:58,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2024-09-25 20:50:06,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=828384.6666666666, ans=0.125 2024-09-25 20:50:11,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=828384.6666666666, ans=0.1 2024-09-25 20:50:16,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=828384.6666666666, ans=0.0 2024-09-25 20:50:19,242 INFO [train.py:1198] (3/4) Epoch 46, batch 2200, loss[loss=0.159, ctc_loss=0.1009, cr_loss=0.2905, over 15963.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1183, cr_loss=0.3339, over 3374837.26 frames. ], batch size: 35, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 20:50:21,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=828431.3333333334, ans=0.0 2024-09-25 20:50:30,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=828431.3333333334, ans=0.025 2024-09-25 20:50:31,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-09-25 20:50:47,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=828478.0, ans=0.0 2024-09-25 20:51:17,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=828571.3333333334, ans=0.125 2024-09-25 20:51:20,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2024-09-25 20:51:44,715 INFO [train.py:1198] (3/4) Epoch 46, batch 2250, loss[loss=0.1452, ctc_loss=0.09188, cr_loss=0.2665, over 17035.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1186, cr_loss=0.3346, over 3360341.47 frames. ], batch size: 39, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 20:51:49,437 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.294e+02 1.364e+02 1.489e+02 2.080e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-25 20:52:43,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=828804.6666666666, ans=0.2 2024-09-25 20:52:53,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=828851.3333333334, ans=0.125 2024-09-25 20:53:04,650 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 20:53:07,435 INFO [train.py:1198] (3/4) Epoch 46, batch 2300, loss[loss=0.2035, ctc_loss=0.1322, cr_loss=0.3561, over 17359.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1191, cr_loss=0.3356, over 3355908.53 frames. ], batch size: 48, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 20:53:07,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=828898.0, ans=0.125 2024-09-25 20:53:09,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=22.5 2024-09-25 20:53:36,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=828944.6666666666, ans=0.0 2024-09-25 20:53:36,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=828944.6666666666, ans=0.125 2024-09-25 20:53:52,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=828991.3333333334, ans=0.125 2024-09-25 20:54:05,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2024-09-25 20:54:17,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=829084.6666666666, ans=0.1 2024-09-25 20:54:27,183 INFO [train.py:1198] (3/4) Epoch 46, batch 2350, loss[loss=0.1414, ctc_loss=0.08534, cr_loss=0.2802, over 17017.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1187, cr_loss=0.3356, over 3359755.12 frames. ], batch size: 39, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 20:54:31,906 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.291e+02 1.368e+02 1.435e+02 1.902e+02, threshold=2.737e+02, percent-clipped=0.0 2024-09-25 20:54:40,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=829131.3333333334, ans=0.125 2024-09-25 20:55:07,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=829224.6666666666, ans=0.1 2024-09-25 20:55:09,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.11 vs. limit=22.5 2024-09-25 20:55:23,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=829271.3333333334, ans=0.125 2024-09-25 20:55:24,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=829271.3333333334, ans=0.2 2024-09-25 20:55:50,422 INFO [train.py:1198] (3/4) Epoch 46, batch 2400, loss[loss=0.1899, ctc_loss=0.123, cr_loss=0.3347, over 17239.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1195, cr_loss=0.3366, over 3347020.74 frames. ], batch size: 55, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 20:55:59,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=829364.6666666666, ans=0.025 2024-09-25 20:56:15,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=829411.3333333334, ans=0.1 2024-09-25 20:56:31,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=829458.0, ans=0.2 2024-09-25 20:56:49,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2024-09-25 20:56:52,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2024-09-25 20:57:15,352 INFO [train.py:1198] (3/4) Epoch 46, batch 2450, loss[loss=0.2104, ctc_loss=0.1397, cr_loss=0.3537, over 12503.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1199, cr_loss=0.3374, over 3346400.94 frames. ], batch size: 123, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 20:57:20,236 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.311e+02 1.415e+02 1.511e+02 3.334e+02, threshold=2.830e+02, percent-clipped=1.0 2024-09-25 20:57:22,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=829598.0, ans=0.125 2024-09-25 20:57:34,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=829644.6666666666, ans=0.125 2024-09-25 20:58:02,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=829691.3333333334, ans=0.1 2024-09-25 20:58:06,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=829738.0, ans=0.125 2024-09-25 20:58:19,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=829738.0, ans=0.2 2024-09-25 20:58:19,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=829738.0, ans=0.0 2024-09-25 20:58:29,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.11 vs. limit=15.0 2024-09-25 20:58:38,039 INFO [train.py:1198] (3/4) Epoch 46, batch 2500, loss[loss=0.1978, ctc_loss=0.129, cr_loss=0.3442, over 17232.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1198, cr_loss=0.3369, over 3354008.67 frames. ], batch size: 55, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 20:58:47,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=829831.3333333334, ans=0.1 2024-09-25 20:59:01,159 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.84 vs. limit=10.0 2024-09-25 20:59:07,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=829878.0, ans=0.0 2024-09-25 20:59:08,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=829924.6666666666, ans=0.125 2024-09-25 20:59:10,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=829924.6666666666, ans=0.125 2024-09-25 20:59:18,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=829924.6666666666, ans=0.2 2024-09-25 20:59:19,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2024-09-25 20:59:26,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=829971.3333333334, ans=0.0 2024-09-25 20:59:31,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=829971.3333333334, ans=0.0 2024-09-25 20:59:58,776 INFO [train.py:1198] (3/4) Epoch 46, batch 2550, loss[loss=0.1677, ctc_loss=0.106, cr_loss=0.3084, over 17210.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.119, cr_loss=0.3352, over 3362033.89 frames. ], batch size: 47, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:00:03,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=830064.6666666666, ans=0.0 2024-09-25 21:00:06,167 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.313e+02 1.390e+02 1.516e+02 1.832e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-25 21:00:08,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.95 vs. limit=22.5 2024-09-25 21:00:22,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=830111.3333333334, ans=0.1 2024-09-25 21:00:48,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=830204.6666666666, ans=0.1 2024-09-25 21:01:14,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.24 vs. limit=10.0 2024-09-25 21:01:16,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.50 vs. limit=10.0 2024-09-25 21:01:26,893 INFO [train.py:1198] (3/4) Epoch 46, batch 2600, loss[loss=0.2274, ctc_loss=0.1498, cr_loss=0.3879, over 14973.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1188, cr_loss=0.335, over 3365021.90 frames. ], batch size: 89, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:02:01,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=830391.3333333334, ans=0.2 2024-09-25 21:02:22,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=830438.0, ans=0.0 2024-09-25 21:02:41,287 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 21:02:50,772 INFO [train.py:1198] (3/4) Epoch 46, batch 2650, loss[loss=0.1808, ctc_loss=0.1176, cr_loss=0.3161, over 17285.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1188, cr_loss=0.3348, over 3370154.02 frames. ], batch size: 49, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:02:55,451 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.287e+02 1.345e+02 1.480e+02 2.035e+02, threshold=2.690e+02, percent-clipped=0.0 2024-09-25 21:03:03,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=830531.3333333334, ans=0.125 2024-09-25 21:03:19,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830578.0, ans=0.1 2024-09-25 21:03:43,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=830671.3333333334, ans=0.1 2024-09-25 21:03:43,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=830671.3333333334, ans=0.5 2024-09-25 21:04:10,492 INFO [train.py:1198] (3/4) Epoch 46, batch 2700, loss[loss=0.1864, ctc_loss=0.1157, cr_loss=0.3533, over 16967.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1203, cr_loss=0.3372, over 3348139.31 frames. ], batch size: 42, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:04:15,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=830764.6666666666, ans=0.125 2024-09-25 21:04:18,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830764.6666666666, ans=0.1 2024-09-25 21:04:20,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=830764.6666666666, ans=0.0 2024-09-25 21:04:33,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=830811.3333333334, ans=0.125 2024-09-25 21:04:45,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=830858.0, ans=0.0 2024-09-25 21:05:32,279 INFO [train.py:1198] (3/4) Epoch 46, batch 2750, loss[loss=0.1766, ctc_loss=0.1141, cr_loss=0.3129, over 17233.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1199, cr_loss=0.3362, over 3350537.04 frames. ], batch size: 50, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:05:37,114 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.184e+02 1.313e+02 1.406e+02 1.516e+02 1.975e+02, threshold=2.813e+02, percent-clipped=0.0 2024-09-25 21:06:05,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=831044.6666666666, ans=0.0 2024-09-25 21:06:06,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=831044.6666666666, ans=0.1 2024-09-25 21:06:08,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=831091.3333333334, ans=0.0 2024-09-25 21:06:11,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=831091.3333333334, ans=0.125 2024-09-25 21:06:45,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=831184.6666666666, ans=0.2 2024-09-25 21:06:57,842 INFO [train.py:1198] (3/4) Epoch 46, batch 2800, loss[loss=0.1817, ctc_loss=0.1159, cr_loss=0.329, over 16723.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1198, cr_loss=0.3357, over 3344812.96 frames. ], batch size: 37, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:07:04,758 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 21:07:45,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=831324.6666666666, ans=0.125 2024-09-25 21:07:53,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=831371.3333333334, ans=0.125 2024-09-25 21:08:20,590 INFO [train.py:1198] (3/4) Epoch 46, batch 2850, loss[loss=0.1941, ctc_loss=0.1238, cr_loss=0.3514, over 17305.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.12, cr_loss=0.3367, over 3347346.04 frames. ], batch size: 46, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:08:26,959 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.275e+02 1.372e+02 1.492e+02 2.089e+02, threshold=2.744e+02, percent-clipped=0.0 2024-09-25 21:08:29,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.85 vs. limit=15.0 2024-09-25 21:08:43,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=831511.3333333334, ans=0.125 2024-09-25 21:08:59,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=831558.0, ans=0.07 2024-09-25 21:09:00,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=831558.0, ans=0.0 2024-09-25 21:09:15,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=831604.6666666666, ans=0.0 2024-09-25 21:09:22,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=831604.6666666666, ans=0.2 2024-09-25 21:09:28,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=831651.3333333334, ans=0.125 2024-09-25 21:09:32,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2024-09-25 21:09:38,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=831651.3333333334, ans=0.05 2024-09-25 21:09:40,972 INFO [train.py:1198] (3/4) Epoch 46, batch 2900, loss[loss=0.195, ctc_loss=0.125, cr_loss=0.3499, over 17223.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1205, cr_loss=0.338, over 3346193.53 frames. ], batch size: 50, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:10:12,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=831744.6666666666, ans=0.125 2024-09-25 21:10:19,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=831791.3333333334, ans=12.0 2024-09-25 21:10:23,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=831791.3333333334, ans=0.2 2024-09-25 21:10:30,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=831838.0, ans=0.125 2024-09-25 21:10:38,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=831838.0, ans=0.0 2024-09-25 21:11:08,574 INFO [train.py:1198] (3/4) Epoch 46, batch 2950, loss[loss=0.1747, ctc_loss=0.1108, cr_loss=0.3195, over 17258.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1203, cr_loss=0.3377, over 3353090.78 frames. ], batch size: 42, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:11:16,442 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.273e+02 1.341e+02 1.442e+02 4.466e+02, threshold=2.682e+02, percent-clipped=1.0 2024-09-25 21:11:24,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=831978.0, ans=0.125 2024-09-25 21:11:45,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=832024.6666666666, ans=0.0 2024-09-25 21:11:45,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=832024.6666666666, ans=0.0 2024-09-25 21:11:47,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=832024.6666666666, ans=0.0 2024-09-25 21:11:55,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=832071.3333333334, ans=0.125 2024-09-25 21:12:03,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=832071.3333333334, ans=0.025 2024-09-25 21:12:21,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=832118.0, ans=0.2 2024-09-25 21:12:27,907 INFO [train.py:1198] (3/4) Epoch 46, batch 3000, loss[loss=0.2272, ctc_loss=0.1535, cr_loss=0.3687, over 11887.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1198, cr_loss=0.3373, over 3352036.24 frames. ], batch size: 123, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:12:27,908 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 21:12:43,451 INFO [train.py:1230] (3/4) Epoch 46, validation: loss=0.03583, ctc_loss=0.03583, cr_loss=1.006e-14, over 944034.00 frames. 2024-09-25 21:12:43,452 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 21:13:38,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=832304.6666666666, ans=0.09899494936611666 2024-09-25 21:13:38,987 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.43 vs. limit=15.0 2024-09-25 21:13:44,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=832351.3333333334, ans=0.0 2024-09-25 21:14:01,696 INFO [train.py:1198] (3/4) Epoch 46, batch 3050, loss[loss=0.2358, ctc_loss=0.1618, cr_loss=0.3701, over 12138.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.12, cr_loss=0.3381, over 3354455.81 frames. ], batch size: 124, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:14:01,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=832398.0, ans=0.0 2024-09-25 21:14:04,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=832398.0, ans=0.015 2024-09-25 21:14:09,459 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.315e+02 1.415e+02 1.524e+02 2.421e+02, threshold=2.829e+02, percent-clipped=0.0 2024-09-25 21:14:09,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=832398.0, ans=0.0 2024-09-25 21:14:12,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=832398.0, ans=0.2 2024-09-25 21:14:20,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=832444.6666666666, ans=0.0 2024-09-25 21:14:44,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=832491.3333333334, ans=0.125 2024-09-25 21:15:03,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832584.6666666666, ans=0.1 2024-09-25 21:15:07,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=832584.6666666666, ans=0.1 2024-09-25 21:15:11,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=832584.6666666666, ans=0.125 2024-09-25 21:15:20,314 INFO [train.py:1198] (3/4) Epoch 46, batch 3100, loss[loss=0.1441, ctc_loss=0.08976, cr_loss=0.2718, over 17215.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1194, cr_loss=0.337, over 3358250.15 frames. ], batch size: 41, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:15:39,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.48 vs. limit=15.0 2024-09-25 21:16:02,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.05 vs. limit=22.5 2024-09-25 21:16:15,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=832771.3333333334, ans=0.1 2024-09-25 21:16:18,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=832771.3333333334, ans=0.2 2024-09-25 21:16:33,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=832818.0, ans=0.0 2024-09-25 21:16:38,988 INFO [train.py:1198] (3/4) Epoch 46, batch 3150, loss[loss=0.2046, ctc_loss=0.1294, cr_loss=0.3758, over 17255.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1191, cr_loss=0.3361, over 3364203.78 frames. ], batch size: 50, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:16:40,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=832864.6666666666, ans=0.125 2024-09-25 21:16:46,804 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.285e+02 1.343e+02 1.430e+02 2.452e+02, threshold=2.687e+02, percent-clipped=0.0 2024-09-25 21:17:13,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=832958.0, ans=0.0 2024-09-25 21:17:24,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=833004.6666666666, ans=0.125 2024-09-25 21:17:29,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=833004.6666666666, ans=0.0 2024-09-25 21:17:37,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=833004.6666666666, ans=0.1 2024-09-25 21:17:39,170 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2024-09-25 21:17:41,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=833051.3333333334, ans=0.125 2024-09-25 21:17:53,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=833051.3333333334, ans=0.125 2024-09-25 21:17:59,372 INFO [train.py:1198] (3/4) Epoch 46, batch 3200, loss[loss=0.1652, ctc_loss=0.102, cr_loss=0.3163, over 16946.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1196, cr_loss=0.337, over 3364861.06 frames. ], batch size: 42, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:18:35,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=833191.3333333334, ans=0.0 2024-09-25 21:18:45,893 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.72 vs. limit=15.0 2024-09-25 21:19:03,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=833284.6666666666, ans=0.2 2024-09-25 21:19:11,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=833284.6666666666, ans=0.0 2024-09-25 21:19:17,330 INFO [train.py:1198] (3/4) Epoch 46, batch 3250, loss[loss=0.1678, ctc_loss=0.1065, cr_loss=0.3067, over 17027.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1206, cr_loss=0.3386, over 3364647.32 frames. ], batch size: 44, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:19:19,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=833331.3333333334, ans=0.125 2024-09-25 21:19:25,152 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.305e+02 1.369e+02 1.483e+02 2.240e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-25 21:19:28,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=833331.3333333334, ans=0.0 2024-09-25 21:19:28,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=833331.3333333334, ans=0.2 2024-09-25 21:19:33,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=833378.0, ans=0.125 2024-09-25 21:19:43,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.24 vs. limit=15.0 2024-09-25 21:19:52,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=833424.6666666666, ans=0.02 2024-09-25 21:20:06,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=833424.6666666666, ans=0.125 2024-09-25 21:20:25,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=833518.0, ans=0.2 2024-09-25 21:20:40,790 INFO [train.py:1198] (3/4) Epoch 46, batch 3300, loss[loss=0.1544, ctc_loss=0.09951, cr_loss=0.2742, over 16704.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1207, cr_loss=0.3382, over 3358243.35 frames. ], batch size: 37, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:20:50,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=833564.6666666666, ans=0.1 2024-09-25 21:21:23,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=833658.0, ans=0.1 2024-09-25 21:21:39,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.03 vs. limit=15.0 2024-09-25 21:21:46,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=833751.3333333334, ans=0.2 2024-09-25 21:21:48,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=833751.3333333334, ans=0.035 2024-09-25 21:21:48,347 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 21:21:51,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-09-25 21:21:58,830 INFO [train.py:1198] (3/4) Epoch 46, batch 3350, loss[loss=0.1965, ctc_loss=0.1245, cr_loss=0.36, over 17017.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1206, cr_loss=0.3379, over 3365708.61 frames. ], batch size: 56, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:22:06,615 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.301e+02 1.404e+02 1.465e+02 2.030e+02, threshold=2.807e+02, percent-clipped=0.0 2024-09-25 21:22:11,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.54 vs. limit=10.0 2024-09-25 21:22:41,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=833891.3333333334, ans=0.95 2024-09-25 21:22:54,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.32 vs. limit=15.0 2024-09-25 21:22:59,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=833984.6666666666, ans=0.2 2024-09-25 21:23:01,252 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 21:23:05,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=833984.6666666666, ans=0.125 2024-09-25 21:23:16,306 INFO [train.py:1198] (3/4) Epoch 46, batch 3400, loss[loss=0.1825, ctc_loss=0.1165, cr_loss=0.33, over 17025.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1204, cr_loss=0.3374, over 3366651.72 frames. ], batch size: 44, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:23:20,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2024-09-25 21:23:26,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=834031.3333333334, ans=0.5 2024-09-25 21:23:50,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=834124.6666666666, ans=0.0 2024-09-25 21:24:33,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=834218.0, ans=0.125 2024-09-25 21:24:36,349 INFO [train.py:1198] (3/4) Epoch 46, batch 3450, loss[loss=0.1588, ctc_loss=0.101, cr_loss=0.2888, over 16304.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1208, cr_loss=0.3382, over 3352884.17 frames. ], batch size: 36, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:24:44,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=834264.6666666666, ans=0.025 2024-09-25 21:24:45,571 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.317e+02 1.418e+02 1.501e+02 3.351e+02, threshold=2.836e+02, percent-clipped=1.0 2024-09-25 21:24:49,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=834264.6666666666, ans=0.125 2024-09-25 21:25:20,881 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.16 vs. limit=10.0 2024-09-25 21:25:54,708 INFO [train.py:1198] (3/4) Epoch 46, batch 3500, loss[loss=0.2108, ctc_loss=0.1372, cr_loss=0.3684, over 16998.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1205, cr_loss=0.3379, over 3361909.02 frames. ], batch size: 53, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:25:58,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=834498.0, ans=0.125 2024-09-25 21:26:12,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2024-09-25 21:26:24,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=834591.3333333334, ans=0.1 2024-09-25 21:26:35,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=834591.3333333334, ans=0.025 2024-09-25 21:26:35,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-09-25 21:26:41,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=834638.0, ans=0.125 2024-09-25 21:27:03,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=834684.6666666666, ans=0.0 2024-09-25 21:27:06,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=834684.6666666666, ans=0.125 2024-09-25 21:27:12,620 INFO [train.py:1198] (3/4) Epoch 46, batch 3550, loss[loss=0.2015, ctc_loss=0.133, cr_loss=0.3424, over 17021.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1206, cr_loss=0.3377, over 3358461.91 frames. ], batch size: 56, lr: 2.56e-03, grad_scale: 16.0 2024-09-25 21:27:21,879 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 1.283e+02 1.360e+02 1.445e+02 2.258e+02, threshold=2.720e+02, percent-clipped=0.0 2024-09-25 21:27:50,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=834824.6666666666, ans=0.035 2024-09-25 21:28:01,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834871.3333333334, ans=0.1 2024-09-25 21:28:01,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=834871.3333333334, ans=0.125 2024-09-25 21:28:07,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=834871.3333333334, ans=0.0 2024-09-25 21:28:15,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=834918.0, ans=0.125 2024-09-25 21:28:32,454 INFO [train.py:1198] (3/4) Epoch 46, batch 3600, loss[loss=0.167, ctc_loss=0.1044, cr_loss=0.3133, over 17103.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1196, cr_loss=0.3361, over 3365829.33 frames. ], batch size: 40, lr: 2.56e-03, grad_scale: 32.0 2024-09-25 21:28:59,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=835011.3333333334, ans=0.125 2024-09-25 21:28:59,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2024-09-25 21:29:02,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=835058.0, ans=0.125 2024-09-25 21:29:12,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=835058.0, ans=0.125 2024-09-25 21:29:18,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2024-09-25 21:29:33,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=835104.6666666666, ans=0.1 2024-09-25 21:29:55,384 INFO [train.py:1198] (3/4) Epoch 46, batch 3650, loss[loss=0.1696, ctc_loss=0.1091, cr_loss=0.3027, over 16962.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.12, cr_loss=0.3372, over 3367478.21 frames. ], batch size: 42, lr: 2.56e-03, grad_scale: 32.0 2024-09-25 21:30:04,585 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.303e+02 1.373e+02 1.458e+02 2.085e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-25 21:30:12,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=835244.6666666666, ans=0.0 2024-09-25 21:30:46,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.76 vs. limit=10.0 2024-09-25 21:30:47,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=835338.0, ans=0.07 2024-09-25 21:30:57,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=835384.6666666666, ans=0.0 2024-09-25 21:31:14,429 INFO [train.py:1198] (3/4) Epoch 46, batch 3700, loss[loss=0.2388, ctc_loss=0.1553, cr_loss=0.4173, over 16578.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1209, cr_loss=0.3383, over 3350849.27 frames. ], batch size: 66, lr: 2.56e-03, grad_scale: 16.0 2024-09-25 21:31:17,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=835431.3333333334, ans=0.125 2024-09-25 21:31:20,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=835431.3333333334, ans=0.2 2024-09-25 21:31:58,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=835524.6666666666, ans=0.125 2024-09-25 21:32:17,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=835618.0, ans=0.07 2024-09-25 21:32:17,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.60 vs. limit=22.5 2024-09-25 21:32:32,254 INFO [train.py:1198] (3/4) Epoch 46, batch 3750, loss[loss=0.1963, ctc_loss=0.1316, cr_loss=0.3238, over 11840.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1211, cr_loss=0.338, over 3334128.74 frames. ], batch size: 125, lr: 2.56e-03, grad_scale: 16.0 2024-09-25 21:32:34,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=835664.6666666666, ans=0.125 2024-09-25 21:32:40,601 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2024-09-25 21:32:43,198 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.294e+02 1.384e+02 1.512e+02 2.088e+02, threshold=2.767e+02, percent-clipped=0.0 2024-09-25 21:33:02,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=835758.0, ans=0.04949747468305833 2024-09-25 21:33:03,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=835758.0, ans=0.025 2024-09-25 21:33:03,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=835758.0, ans=0.0 2024-09-25 21:33:21,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=835804.6666666666, ans=10.0 2024-09-25 21:33:23,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=22.5 2024-09-25 21:33:46,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=835851.3333333334, ans=15.0 2024-09-25 21:33:50,843 INFO [train.py:1198] (3/4) Epoch 46, batch 3800, loss[loss=0.1686, ctc_loss=0.1061, cr_loss=0.3128, over 16640.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1215, cr_loss=0.3386, over 3317540.12 frames. ], batch size: 37, lr: 2.56e-03, grad_scale: 16.0 2024-09-25 21:33:51,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=835898.0, ans=0.5 2024-09-25 21:33:55,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=835898.0, ans=0.125 2024-09-25 21:35:06,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=836084.6666666666, ans=0.2 2024-09-25 21:35:09,010 INFO [train.py:1198] (3/4) Epoch 46, batch 3850, loss[loss=0.1861, ctc_loss=0.1214, cr_loss=0.3236, over 12369.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1225, cr_loss=0.3392, over 3264542.04 frames. ], batch size: 123, lr: 2.56e-03, grad_scale: 16.0 2024-09-25 21:35:09,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=836131.3333333334, ans=0.125 2024-09-25 21:35:09,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=836131.3333333334, ans=0.125 2024-09-25 21:35:18,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=836131.3333333334, ans=0.125 2024-09-25 21:35:19,791 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.194e+02 1.341e+02 1.442e+02 1.559e+02 2.635e+02, threshold=2.885e+02, percent-clipped=0.0 2024-09-25 21:35:21,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=836131.3333333334, ans=0.0 2024-09-25 21:35:47,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=836224.6666666666, ans=0.125 2024-09-25 21:35:48,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=836224.6666666666, ans=0.0 2024-09-25 21:35:50,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=836224.6666666666, ans=0.125 2024-09-25 21:35:58,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=22.5 2024-09-25 21:36:13,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=836318.0, ans=0.0 2024-09-25 21:37:05,584 INFO [train.py:1198] (3/4) Epoch 47, batch 0, loss[loss=0.2406, ctc_loss=0.1585, cr_loss=0.4104, over 15127.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1585, cr_loss=0.4104, over 15127.00 frames. ], batch size: 89, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:37:05,584 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 21:37:13,057 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.7098, 4.5899, 4.1714, 4.6902], device='cuda:3') 2024-09-25 21:37:22,167 INFO [train.py:1230] (3/4) Epoch 47, validation: loss=0.03509, ctc_loss=0.03509, cr_loss=1.062e-14, over 944034.00 frames. 2024-09-25 21:37:22,168 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 21:37:25,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=836346.0, ans=0.1 2024-09-25 21:37:25,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=836346.0, ans=0.2 2024-09-25 21:38:05,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=836439.3333333334, ans=0.125 2024-09-25 21:38:07,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=836439.3333333334, ans=0.125 2024-09-25 21:38:44,944 INFO [train.py:1198] (3/4) Epoch 47, batch 50, loss[loss=0.1531, ctc_loss=0.09478, cr_loss=0.2917, over 16672.00 frames. ], tot_loss[loss=0.1841, ctc_loss=0.1178, cr_loss=0.3317, over 764974.10 frames. ], batch size: 37, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:39:02,615 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.187e+02 1.320e+02 1.514e+02 1.646e+02 2.881e+02, threshold=3.028e+02, percent-clipped=0.0 2024-09-25 21:39:24,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2024-09-25 21:39:28,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=836672.6666666666, ans=0.125 2024-09-25 21:40:05,069 INFO [train.py:1198] (3/4) Epoch 47, batch 100, loss[loss=0.1781, ctc_loss=0.1133, cr_loss=0.3238, over 17057.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1199, cr_loss=0.3364, over 1336133.63 frames. ], batch size: 52, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:40:10,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=836812.6666666666, ans=0.07 2024-09-25 21:40:14,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=836812.6666666666, ans=0.2 2024-09-25 21:40:32,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=836859.3333333334, ans=0.0 2024-09-25 21:40:34,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=836859.3333333334, ans=0.2 2024-09-25 21:40:44,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=836906.0, ans=0.0 2024-09-25 21:40:44,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2024-09-25 21:40:45,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=836906.0, ans=0.09899494936611666 2024-09-25 21:41:02,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.75 vs. limit=10.0 2024-09-25 21:41:11,483 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2024-09-25 21:41:25,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=837046.0, ans=0.0 2024-09-25 21:41:26,869 INFO [train.py:1198] (3/4) Epoch 47, batch 150, loss[loss=0.1889, ctc_loss=0.1225, cr_loss=0.3316, over 17102.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1185, cr_loss=0.3336, over 1787321.40 frames. ], batch size: 49, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:41:27,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=837046.0, ans=0.2 2024-09-25 21:41:44,261 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.291e+02 1.353e+02 1.432e+02 2.053e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-25 21:41:56,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=22.5 2024-09-25 21:42:18,243 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 21:42:28,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.04 vs. limit=10.0 2024-09-25 21:42:47,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=837232.6666666666, ans=10.0 2024-09-25 21:42:53,062 INFO [train.py:1198] (3/4) Epoch 47, batch 200, loss[loss=0.2072, ctc_loss=0.1363, cr_loss=0.3545, over 15945.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1186, cr_loss=0.3346, over 2142032.25 frames. ], batch size: 74, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:43:01,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=837279.3333333334, ans=0.0 2024-09-25 21:43:05,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.83 vs. limit=22.5 2024-09-25 21:43:07,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=837326.0, ans=0.025 2024-09-25 21:43:13,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=837326.0, ans=0.125 2024-09-25 21:43:45,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=837419.3333333334, ans=0.125 2024-09-25 21:43:53,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=837419.3333333334, ans=0.125 2024-09-25 21:44:15,405 INFO [train.py:1198] (3/4) Epoch 47, batch 250, loss[loss=0.2012, ctc_loss=0.1297, cr_loss=0.3576, over 17360.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1186, cr_loss=0.3351, over 2415292.50 frames. ], batch size: 48, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:44:23,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=837512.6666666666, ans=0.125 2024-09-25 21:44:32,924 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.332e+02 1.402e+02 1.492e+02 3.444e+02, threshold=2.804e+02, percent-clipped=1.0 2024-09-25 21:44:36,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=837559.3333333334, ans=0.125 2024-09-25 21:44:40,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.44 vs. limit=5.0 2024-09-25 21:45:08,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=837652.6666666666, ans=0.2 2024-09-25 21:45:15,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.59 vs. limit=15.0 2024-09-25 21:45:16,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=837652.6666666666, ans=10.0 2024-09-25 21:45:35,452 INFO [train.py:1198] (3/4) Epoch 47, batch 300, loss[loss=0.2411, ctc_loss=0.1577, cr_loss=0.417, over 15190.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1186, cr_loss=0.3347, over 2612280.61 frames. ], batch size: 89, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:45:41,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=837746.0, ans=0.025 2024-09-25 21:46:06,252 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.97 vs. limit=10.0 2024-09-25 21:46:23,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=837839.3333333334, ans=0.125 2024-09-25 21:46:43,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.95 vs. limit=22.5 2024-09-25 21:46:45,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=837932.6666666666, ans=0.0 2024-09-25 21:46:49,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=837932.6666666666, ans=0.0 2024-09-25 21:47:01,261 INFO [train.py:1198] (3/4) Epoch 47, batch 350, loss[loss=0.2291, ctc_loss=0.1452, cr_loss=0.4195, over 17045.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.119, cr_loss=0.3361, over 2781375.61 frames. ], batch size: 52, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:47:13,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=837979.3333333334, ans=0.125 2024-09-25 21:47:18,802 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.331e+02 1.404e+02 1.493e+02 2.320e+02, threshold=2.808e+02, percent-clipped=0.0 2024-09-25 21:47:29,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=838026.0, ans=0.125 2024-09-25 21:47:36,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=838072.6666666666, ans=0.125 2024-09-25 21:47:57,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=838119.3333333334, ans=0.0 2024-09-25 21:48:04,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2024-09-25 21:48:06,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=838166.0, ans=0.04949747468305833 2024-09-25 21:48:16,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=838166.0, ans=0.0 2024-09-25 21:48:24,428 INFO [train.py:1198] (3/4) Epoch 47, batch 400, loss[loss=0.1802, ctc_loss=0.1148, cr_loss=0.327, over 17221.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1186, cr_loss=0.3349, over 2912489.95 frames. ], batch size: 47, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:48:29,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=838212.6666666666, ans=0.2 2024-09-25 21:48:34,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=838212.6666666666, ans=22.5 2024-09-25 21:48:40,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=838212.6666666666, ans=0.05 2024-09-25 21:49:05,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=838306.0, ans=0.125 2024-09-25 21:49:17,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=838352.6666666666, ans=0.2 2024-09-25 21:49:23,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=838352.6666666666, ans=0.125 2024-09-25 21:49:28,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=838352.6666666666, ans=0.125 2024-09-25 21:49:30,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=838399.3333333334, ans=0.2 2024-09-25 21:49:33,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=838399.3333333334, ans=0.0 2024-09-25 21:49:34,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=838399.3333333334, ans=0.125 2024-09-25 21:49:47,332 INFO [train.py:1198] (3/4) Epoch 47, batch 450, loss[loss=0.1843, ctc_loss=0.1208, cr_loss=0.3172, over 11574.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1192, cr_loss=0.3357, over 3006984.11 frames. ], batch size: 123, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:50:04,979 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.297e+02 1.382e+02 1.499e+02 1.706e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-25 21:50:05,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.39 vs. limit=12.0 2024-09-25 21:50:06,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-09-25 21:50:14,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=838492.6666666666, ans=0.2 2024-09-25 21:50:21,789 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2024-09-25 21:50:34,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=838586.0, ans=0.125 2024-09-25 21:50:38,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=838586.0, ans=0.0 2024-09-25 21:51:09,782 INFO [train.py:1198] (3/4) Epoch 47, batch 500, loss[loss=0.2147, ctc_loss=0.1394, cr_loss=0.3766, over 15064.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1193, cr_loss=0.3361, over 3073293.11 frames. ], batch size: 89, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:51:13,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2024-09-25 21:51:16,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=838679.3333333334, ans=0.125 2024-09-25 21:51:44,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=838772.6666666666, ans=0.125 2024-09-25 21:52:35,635 INFO [train.py:1198] (3/4) Epoch 47, batch 550, loss[loss=0.1717, ctc_loss=0.1079, cr_loss=0.3192, over 17079.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1188, cr_loss=0.3346, over 3121671.91 frames. ], batch size: 43, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:52:47,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2024-09-25 21:52:53,167 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.330e+02 1.398e+02 1.505e+02 2.211e+02, threshold=2.797e+02, percent-clipped=0.0 2024-09-25 21:52:53,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=838959.3333333334, ans=0.1 2024-09-25 21:53:00,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=838959.3333333334, ans=0.1 2024-09-25 21:53:13,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-09-25 21:53:33,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2024-09-25 21:53:37,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=839052.6666666666, ans=0.125 2024-09-25 21:53:42,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=839099.3333333334, ans=0.0 2024-09-25 21:53:49,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=839099.3333333334, ans=0.0 2024-09-25 21:53:54,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=839099.3333333334, ans=0.2 2024-09-25 21:53:58,553 INFO [train.py:1198] (3/4) Epoch 47, batch 600, loss[loss=0.1996, ctc_loss=0.1288, cr_loss=0.3542, over 16824.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1191, cr_loss=0.3355, over 3175794.25 frames. ], batch size: 61, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:54:03,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=839146.0, ans=0.2 2024-09-25 21:54:12,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.47 vs. limit=10.0 2024-09-25 21:54:23,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=15.0 2024-09-25 21:54:56,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=839286.0, ans=0.125 2024-09-25 21:55:18,539 INFO [train.py:1198] (3/4) Epoch 47, batch 650, loss[loss=0.2015, ctc_loss=0.1306, cr_loss=0.3544, over 16574.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.119, cr_loss=0.335, over 3207979.34 frames. ], batch size: 66, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:55:30,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=839379.3333333334, ans=0.125 2024-09-25 21:55:36,238 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.288e+02 1.367e+02 1.485e+02 1.946e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-25 21:56:41,551 INFO [train.py:1198] (3/4) Epoch 47, batch 700, loss[loss=0.2362, ctc_loss=0.1585, cr_loss=0.3884, over 14878.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1191, cr_loss=0.3357, over 3239401.02 frames. ], batch size: 89, lr: 2.53e-03, grad_scale: 16.0 2024-09-25 21:57:25,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=839706.0, ans=0.1 2024-09-25 21:57:45,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.01 vs. limit=15.0 2024-09-25 21:58:01,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=839799.3333333334, ans=0.125 2024-09-25 21:58:06,325 INFO [train.py:1198] (3/4) Epoch 47, batch 750, loss[loss=0.1946, ctc_loss=0.1229, cr_loss=0.3589, over 17313.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1193, cr_loss=0.3354, over 3257111.53 frames. ], batch size: 51, lr: 2.53e-03, grad_scale: 16.0 2024-09-25 21:58:08,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=839846.0, ans=0.125 2024-09-25 21:58:10,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2024-09-25 21:58:27,884 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.312e+02 1.373e+02 1.486e+02 2.253e+02, threshold=2.746e+02, percent-clipped=0.0 2024-09-25 21:58:41,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=839939.3333333334, ans=0.1 2024-09-25 21:58:42,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=839939.3333333334, ans=0.0 2024-09-25 21:58:44,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=839939.3333333334, ans=0.2 2024-09-25 21:59:22,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=840032.6666666666, ans=0.07 2024-09-25 21:59:31,442 INFO [train.py:1198] (3/4) Epoch 47, batch 800, loss[loss=0.1747, ctc_loss=0.1084, cr_loss=0.3317, over 17260.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1189, cr_loss=0.3344, over 3269196.60 frames. ], batch size: 44, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:59:46,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=840126.0, ans=0.95 2024-09-25 22:00:05,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=840172.6666666666, ans=0.125 2024-09-25 22:00:43,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.73 vs. limit=10.0 2024-09-25 22:00:48,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2024-09-25 22:00:54,192 INFO [train.py:1198] (3/4) Epoch 47, batch 850, loss[loss=0.2279, ctc_loss=0.1478, cr_loss=0.4004, over 17026.00 frames. ], tot_loss[loss=0.1861, ctc_loss=0.1191, cr_loss=0.3351, over 3292496.68 frames. ], batch size: 52, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:01:02,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=840312.6666666666, ans=0.025 2024-09-25 22:01:03,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=840312.6666666666, ans=0.0 2024-09-25 22:01:07,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=840312.6666666666, ans=0.1 2024-09-25 22:01:13,304 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 1.297e+02 1.378e+02 1.477e+02 2.622e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 22:01:13,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=840359.3333333334, ans=0.125 2024-09-25 22:01:15,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2024-09-25 22:01:18,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=840359.3333333334, ans=0.025 2024-09-25 22:01:20,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.12 vs. limit=15.0 2024-09-25 22:01:34,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=840406.0, ans=0.1 2024-09-25 22:01:42,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.71 vs. limit=6.0 2024-09-25 22:01:44,239 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.27 vs. limit=12.0 2024-09-25 22:02:02,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=840499.3333333334, ans=0.1 2024-09-25 22:02:19,134 INFO [train.py:1198] (3/4) Epoch 47, batch 900, loss[loss=0.1968, ctc_loss=0.1272, cr_loss=0.3478, over 17104.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1187, cr_loss=0.3344, over 3312018.39 frames. ], batch size: 49, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:02:52,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=840639.3333333334, ans=0.05 2024-09-25 22:02:59,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=840639.3333333334, ans=0.1 2024-09-25 22:03:35,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840732.6666666666, ans=0.1 2024-09-25 22:03:38,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=840732.6666666666, ans=0.025 2024-09-25 22:03:41,093 INFO [train.py:1198] (3/4) Epoch 47, batch 950, loss[loss=0.1663, ctc_loss=0.1049, cr_loss=0.3067, over 17033.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1186, cr_loss=0.3343, over 3317859.71 frames. ], batch size: 51, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:03:43,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=840779.3333333334, ans=0.0 2024-09-25 22:03:45,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2024-09-25 22:03:58,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=840826.0, ans=0.125 2024-09-25 22:03:58,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=840826.0, ans=0.2 2024-09-25 22:04:00,131 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.323e+02 1.399e+02 1.542e+02 2.460e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-25 22:04:17,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=840872.6666666666, ans=0.025 2024-09-25 22:04:18,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=22.5 2024-09-25 22:04:22,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=840872.6666666666, ans=0.125 2024-09-25 22:04:48,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=840966.0, ans=0.0 2024-09-25 22:04:57,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=840966.0, ans=0.2 2024-09-25 22:04:59,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=841012.6666666666, ans=0.125 2024-09-25 22:05:00,653 INFO [train.py:1198] (3/4) Epoch 47, batch 1000, loss[loss=0.1913, ctc_loss=0.1238, cr_loss=0.3375, over 16665.00 frames. ], tot_loss[loss=0.1861, ctc_loss=0.119, cr_loss=0.3354, over 3322008.14 frames. ], batch size: 61, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:05:13,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=841012.6666666666, ans=0.125 2024-09-25 22:05:20,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=841059.3333333334, ans=0.125 2024-09-25 22:05:23,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=841059.3333333334, ans=0.0 2024-09-25 22:05:41,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-25 22:06:11,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.83 vs. limit=12.0 2024-09-25 22:06:23,152 INFO [train.py:1198] (3/4) Epoch 47, batch 1050, loss[loss=0.1775, ctc_loss=0.1145, cr_loss=0.3147, over 16973.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1187, cr_loss=0.3351, over 3332847.96 frames. ], batch size: 42, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:06:42,388 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.322e+02 1.409e+02 1.520e+02 2.848e+02, threshold=2.818e+02, percent-clipped=1.0 2024-09-25 22:06:53,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=841292.6666666666, ans=0.0 2024-09-25 22:07:03,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=841339.3333333334, ans=0.07 2024-09-25 22:07:13,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=841339.3333333334, ans=0.0 2024-09-25 22:07:32,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=841432.6666666666, ans=0.0 2024-09-25 22:07:39,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.11 vs. limit=15.0 2024-09-25 22:07:40,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=841432.6666666666, ans=0.125 2024-09-25 22:07:47,937 INFO [train.py:1198] (3/4) Epoch 47, batch 1100, loss[loss=0.1431, ctc_loss=0.08904, cr_loss=0.2702, over 17102.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1188, cr_loss=0.3358, over 3339447.96 frames. ], batch size: 40, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:07:56,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=841479.3333333334, ans=0.125 2024-09-25 22:08:09,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=841526.0, ans=0.125 2024-09-25 22:08:21,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=841572.6666666666, ans=0.0 2024-09-25 22:08:23,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2024-09-25 22:08:32,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=841572.6666666666, ans=0.2 2024-09-25 22:08:45,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=841619.3333333334, ans=0.125 2024-09-25 22:09:10,495 INFO [train.py:1198] (3/4) Epoch 47, batch 1150, loss[loss=0.2261, ctc_loss=0.1478, cr_loss=0.3912, over 16998.00 frames. ], tot_loss[loss=0.1864, ctc_loss=0.1192, cr_loss=0.3363, over 3344367.17 frames. ], batch size: 53, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:09:29,554 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.291e+02 1.365e+02 1.481e+02 2.112e+02, threshold=2.730e+02, percent-clipped=0.0 2024-09-25 22:09:50,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=841806.0, ans=0.125 2024-09-25 22:09:50,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=841806.0, ans=0.125 2024-09-25 22:10:08,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=841852.6666666666, ans=0.125 2024-09-25 22:10:14,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=841899.3333333334, ans=0.125 2024-09-25 22:10:16,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=841899.3333333334, ans=0.125 2024-09-25 22:10:19,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=841899.3333333334, ans=0.0 2024-09-25 22:10:19,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=841899.3333333334, ans=0.125 2024-09-25 22:10:32,845 INFO [train.py:1198] (3/4) Epoch 47, batch 1200, loss[loss=0.2073, ctc_loss=0.1338, cr_loss=0.3675, over 17011.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1188, cr_loss=0.3354, over 3347926.38 frames. ], batch size: 53, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:10:45,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=841946.0, ans=0.125 2024-09-25 22:10:49,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.93 vs. limit=10.0 2024-09-25 22:10:57,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-09-25 22:11:34,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-09-25 22:11:50,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.39 vs. limit=10.0 2024-09-25 22:11:55,783 INFO [train.py:1198] (3/4) Epoch 47, batch 1250, loss[loss=0.186, ctc_loss=0.1185, cr_loss=0.3376, over 17084.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1188, cr_loss=0.3356, over 3354776.26 frames. ], batch size: 49, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:11:56,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842179.3333333334, ans=0.1 2024-09-25 22:11:57,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=842179.3333333334, ans=0.125 2024-09-25 22:12:17,774 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.299e+02 1.369e+02 1.454e+02 1.923e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-25 22:12:37,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=12.0 2024-09-25 22:12:56,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=842319.3333333334, ans=0.05 2024-09-25 22:13:20,998 INFO [train.py:1198] (3/4) Epoch 47, batch 1300, loss[loss=0.1898, ctc_loss=0.1217, cr_loss=0.3403, over 17157.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1189, cr_loss=0.3353, over 3361464.23 frames. ], batch size: 45, lr: 2.53e-03, grad_scale: 16.0 2024-09-25 22:13:38,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=842459.3333333334, ans=0.0 2024-09-25 22:13:56,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=842506.0, ans=0.125 2024-09-25 22:13:58,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2024-09-25 22:14:05,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.40 vs. limit=15.0 2024-09-25 22:14:28,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=842599.3333333334, ans=0.0 2024-09-25 22:14:30,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.46 vs. limit=15.0 2024-09-25 22:14:40,808 INFO [train.py:1198] (3/4) Epoch 47, batch 1350, loss[loss=0.2272, ctc_loss=0.1505, cr_loss=0.3831, over 16594.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1201, cr_loss=0.3371, over 3352136.69 frames. ], batch size: 66, lr: 2.53e-03, grad_scale: 16.0 2024-09-25 22:14:41,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=842646.0, ans=0.1 2024-09-25 22:15:01,642 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.184e+02 1.286e+02 1.352e+02 1.458e+02 2.037e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-25 22:15:06,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=842692.6666666666, ans=0.125 2024-09-25 22:15:10,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-25 22:15:11,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=842739.3333333334, ans=0.0 2024-09-25 22:15:13,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=842739.3333333334, ans=0.125 2024-09-25 22:15:52,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=842832.6666666666, ans=0.0 2024-09-25 22:15:53,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842832.6666666666, ans=0.1 2024-09-25 22:16:03,150 INFO [train.py:1198] (3/4) Epoch 47, batch 1400, loss[loss=0.1838, ctc_loss=0.1162, cr_loss=0.3382, over 17289.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1205, cr_loss=0.3379, over 3354888.80 frames. ], batch size: 46, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:16:03,939 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=22.5 2024-09-25 22:16:05,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=842879.3333333334, ans=0.2 2024-09-25 22:16:10,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=22.5 2024-09-25 22:16:22,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842926.0, ans=0.1 2024-09-25 22:16:27,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=842926.0, ans=0.0 2024-09-25 22:16:27,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=842926.0, ans=0.2 2024-09-25 22:16:33,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=842972.6666666666, ans=0.2 2024-09-25 22:17:02,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=843019.3333333334, ans=0.2 2024-09-25 22:17:23,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=843066.0, ans=0.2 2024-09-25 22:17:27,857 INFO [train.py:1198] (3/4) Epoch 47, batch 1450, loss[loss=0.2285, ctc_loss=0.1472, cr_loss=0.4065, over 17047.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1209, cr_loss=0.3387, over 3360655.25 frames. ], batch size: 52, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:17:28,434 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.16 vs. limit=10.0 2024-09-25 22:17:35,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=15.0 2024-09-25 22:17:48,452 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.274e+02 1.355e+02 1.464e+02 2.562e+02, threshold=2.711e+02, percent-clipped=0.0 2024-09-25 22:17:48,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843159.3333333334, ans=0.1 2024-09-25 22:18:11,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2024-09-25 22:18:12,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=843206.0, ans=0.0 2024-09-25 22:18:29,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=843252.6666666666, ans=0.2 2024-09-25 22:18:31,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=843252.6666666666, ans=0.125 2024-09-25 22:18:41,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=12.0 2024-09-25 22:18:47,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=843299.3333333334, ans=0.0 2024-09-25 22:18:50,193 INFO [train.py:1198] (3/4) Epoch 47, batch 1500, loss[loss=0.1758, ctc_loss=0.1093, cr_loss=0.3321, over 17221.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1199, cr_loss=0.3368, over 3362729.09 frames. ], batch size: 47, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:19:01,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843346.0, ans=0.1 2024-09-25 22:19:03,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=843346.0, ans=0.2 2024-09-25 22:19:27,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=843439.3333333334, ans=10.0 2024-09-25 22:19:28,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=843439.3333333334, ans=0.125 2024-09-25 22:19:32,718 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2024-09-25 22:19:33,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=843439.3333333334, ans=0.0 2024-09-25 22:19:41,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=843486.0, ans=0.125 2024-09-25 22:19:45,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.35 vs. limit=15.0 2024-09-25 22:20:10,579 INFO [train.py:1198] (3/4) Epoch 47, batch 1550, loss[loss=0.19, ctc_loss=0.1215, cr_loss=0.3426, over 17007.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1197, cr_loss=0.3367, over 3359907.91 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:20:34,259 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.301e+02 1.382e+02 1.468e+02 1.851e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-25 22:21:19,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843766.0, ans=0.1 2024-09-25 22:21:27,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=843766.0, ans=0.125 2024-09-25 22:21:34,160 INFO [train.py:1198] (3/4) Epoch 47, batch 1600, loss[loss=0.1954, ctc_loss=0.1244, cr_loss=0.3551, over 17161.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1199, cr_loss=0.3375, over 3361879.90 frames. ], batch size: 45, lr: 2.52e-03, grad_scale: 32.0 2024-09-25 22:21:34,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=843812.6666666666, ans=0.125 2024-09-25 22:21:44,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=843812.6666666666, ans=0.0 2024-09-25 22:21:47,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=843812.6666666666, ans=0.09899494936611666 2024-09-25 22:22:53,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=843999.3333333334, ans=0.0 2024-09-25 22:22:56,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=843999.3333333334, ans=0.125 2024-09-25 22:22:59,306 INFO [train.py:1198] (3/4) Epoch 47, batch 1650, loss[loss=0.2426, ctc_loss=0.1617, cr_loss=0.4045, over 15185.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1202, cr_loss=0.3374, over 3345449.99 frames. ], batch size: 89, lr: 2.52e-03, grad_scale: 32.0 2024-09-25 22:23:21,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=844092.6666666666, ans=0.125 2024-09-25 22:23:22,602 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.287e+02 1.363e+02 1.436e+02 1.851e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-25 22:23:22,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=844092.6666666666, ans=0.0 2024-09-25 22:23:27,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=844092.6666666666, ans=0.025 2024-09-25 22:23:32,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=844139.3333333334, ans=0.125 2024-09-25 22:24:00,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=844186.0, ans=0.125 2024-09-25 22:24:01,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=844186.0, ans=0.125 2024-09-25 22:24:05,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=844232.6666666666, ans=0.0 2024-09-25 22:24:10,165 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-09-25 22:24:21,990 INFO [train.py:1198] (3/4) Epoch 47, batch 1700, loss[loss=0.2274, ctc_loss=0.146, cr_loss=0.407, over 17216.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1208, cr_loss=0.3386, over 3348367.02 frames. ], batch size: 55, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:24:32,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-09-25 22:24:43,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=844326.0, ans=0.09899494936611666 2024-09-25 22:25:31,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=844466.0, ans=0.0 2024-09-25 22:25:44,107 INFO [train.py:1198] (3/4) Epoch 47, batch 1750, loss[loss=0.1826, ctc_loss=0.1169, cr_loss=0.3285, over 17312.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1202, cr_loss=0.3375, over 3350171.09 frames. ], batch size: 49, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:26:03,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=844559.3333333334, ans=0.1 2024-09-25 22:26:06,157 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.334e+02 1.394e+02 1.508e+02 1.965e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-25 22:26:27,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.77 vs. limit=10.0 2024-09-25 22:26:41,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=844652.6666666666, ans=0.0 2024-09-25 22:26:50,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=844699.3333333334, ans=0.125 2024-09-25 22:27:08,727 INFO [train.py:1198] (3/4) Epoch 47, batch 1800, loss[loss=0.1828, ctc_loss=0.1176, cr_loss=0.3258, over 17213.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.121, cr_loss=0.339, over 3353038.13 frames. ], batch size: 47, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:27:28,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=844792.6666666666, ans=0.0 2024-09-25 22:27:39,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=844839.3333333334, ans=0.125 2024-09-25 22:28:18,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=844932.6666666666, ans=0.125 2024-09-25 22:28:23,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=844932.6666666666, ans=0.125 2024-09-25 22:28:31,340 INFO [train.py:1198] (3/4) Epoch 47, batch 1850, loss[loss=0.1943, ctc_loss=0.125, cr_loss=0.3463, over 17254.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1206, cr_loss=0.3382, over 3346538.08 frames. ], batch size: 44, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:28:44,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=844979.3333333334, ans=0.125 2024-09-25 22:28:53,365 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.318e+02 1.395e+02 1.509e+02 1.851e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-25 22:29:24,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=845119.3333333334, ans=0.125 2024-09-25 22:29:27,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=845119.3333333334, ans=0.125 2024-09-25 22:29:30,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=845119.3333333334, ans=0.125 2024-09-25 22:29:34,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2024-09-25 22:29:36,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=845166.0, ans=0.2 2024-09-25 22:29:50,500 INFO [train.py:1198] (3/4) Epoch 47, batch 1900, loss[loss=0.1787, ctc_loss=0.1158, cr_loss=0.3144, over 17299.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1206, cr_loss=0.3391, over 3348621.18 frames. ], batch size: 46, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:30:15,489 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.41 vs. limit=15.0 2024-09-25 22:30:25,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=12.0 2024-09-25 22:30:33,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=845306.0, ans=0.025 2024-09-25 22:30:55,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=845399.3333333334, ans=0.0 2024-09-25 22:31:12,214 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.68 vs. limit=15.0 2024-09-25 22:31:12,816 INFO [train.py:1198] (3/4) Epoch 47, batch 1950, loss[loss=0.1943, ctc_loss=0.1245, cr_loss=0.3494, over 16941.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1206, cr_loss=0.3385, over 3334628.54 frames. ], batch size: 42, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:31:24,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=845446.0, ans=0.0 2024-09-25 22:31:29,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=845492.6666666666, ans=0.0 2024-09-25 22:31:29,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=845492.6666666666, ans=0.125 2024-09-25 22:31:35,175 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.279e+02 1.368e+02 1.443e+02 1.975e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 22:31:45,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=845539.3333333334, ans=0.1 2024-09-25 22:32:13,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.12 vs. limit=12.0 2024-09-25 22:32:16,317 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 22:32:38,572 INFO [train.py:1198] (3/4) Epoch 47, batch 2000, loss[loss=0.1906, ctc_loss=0.1196, cr_loss=0.3549, over 17306.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1202, cr_loss=0.338, over 3351818.35 frames. ], batch size: 51, lr: 2.52e-03, grad_scale: 32.0 2024-09-25 22:32:55,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=845726.0, ans=0.0 2024-09-25 22:33:12,901 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2024-09-25 22:33:13,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=845772.6666666666, ans=0.125 2024-09-25 22:33:18,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=845772.6666666666, ans=0.125 2024-09-25 22:33:33,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=845819.3333333334, ans=0.0 2024-09-25 22:33:57,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.76 vs. limit=6.0 2024-09-25 22:34:01,511 INFO [train.py:1198] (3/4) Epoch 47, batch 2050, loss[loss=0.1739, ctc_loss=0.1103, cr_loss=0.3182, over 17024.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1202, cr_loss=0.3377, over 3342538.70 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:34:25,349 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.271e+02 1.350e+02 1.448e+02 1.978e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-25 22:34:38,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=846006.0, ans=0.125 2024-09-25 22:34:40,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=846006.0, ans=0.125 2024-09-25 22:34:56,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.83 vs. limit=10.0 2024-09-25 22:34:59,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=846052.6666666666, ans=0.125 2024-09-25 22:35:12,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=846099.3333333334, ans=0.0 2024-09-25 22:35:23,977 INFO [train.py:1198] (3/4) Epoch 47, batch 2100, loss[loss=0.1413, ctc_loss=0.0865, cr_loss=0.2738, over 16265.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1198, cr_loss=0.3378, over 3345706.43 frames. ], batch size: 36, lr: 2.52e-03, grad_scale: 8.0 2024-09-25 22:35:25,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=846146.0, ans=0.0 2024-09-25 22:36:04,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=846239.3333333334, ans=0.2 2024-09-25 22:36:09,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=846239.3333333334, ans=0.025 2024-09-25 22:36:11,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.80 vs. limit=15.0 2024-09-25 22:36:32,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=846332.6666666666, ans=0.125 2024-09-25 22:36:46,574 INFO [train.py:1198] (3/4) Epoch 47, batch 2150, loss[loss=0.1579, ctc_loss=0.09892, cr_loss=0.2948, over 17202.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1196, cr_loss=0.3369, over 3355983.03 frames. ], batch size: 41, lr: 2.52e-03, grad_scale: 8.0 2024-09-25 22:36:46,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=846379.3333333334, ans=0.125 2024-09-25 22:37:14,615 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.323e+02 1.402e+02 1.477e+02 1.932e+02, threshold=2.805e+02, percent-clipped=0.0 2024-09-25 22:37:15,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=846426.0, ans=0.0 2024-09-25 22:37:24,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=846472.6666666666, ans=0.125 2024-09-25 22:37:46,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.14 vs. limit=15.0 2024-09-25 22:38:09,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=846566.0, ans=0.125 2024-09-25 22:38:12,286 INFO [train.py:1198] (3/4) Epoch 47, batch 2200, loss[loss=0.2067, ctc_loss=0.135, cr_loss=0.3584, over 17172.00 frames. ], tot_loss[loss=0.1867, ctc_loss=0.1195, cr_loss=0.336, over 3354064.99 frames. ], batch size: 45, lr: 2.52e-03, grad_scale: 8.0 2024-09-25 22:38:20,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=846612.6666666666, ans=0.1 2024-09-25 22:38:54,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=846706.0, ans=0.0 2024-09-25 22:39:13,738 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2024-09-25 22:39:24,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=846799.3333333334, ans=0.125 2024-09-25 22:39:32,144 INFO [train.py:1198] (3/4) Epoch 47, batch 2250, loss[loss=0.1524, ctc_loss=0.09472, cr_loss=0.2884, over 17036.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1195, cr_loss=0.3364, over 3348230.30 frames. ], batch size: 39, lr: 2.52e-03, grad_scale: 8.0 2024-09-25 22:39:32,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=846846.0, ans=0.1 2024-09-25 22:39:58,070 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.312e+02 1.382e+02 1.535e+02 2.267e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-25 22:40:41,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=847032.6666666666, ans=0.125 2024-09-25 22:40:45,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=847032.6666666666, ans=0.125 2024-09-25 22:40:55,217 INFO [train.py:1198] (3/4) Epoch 47, batch 2300, loss[loss=0.2051, ctc_loss=0.1297, cr_loss=0.3769, over 17233.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1195, cr_loss=0.3368, over 3352571.77 frames. ], batch size: 55, lr: 2.52e-03, grad_scale: 8.0 2024-09-25 22:42:13,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=847266.0, ans=0.125 2024-09-25 22:42:20,987 INFO [train.py:1198] (3/4) Epoch 47, batch 2350, loss[loss=0.2026, ctc_loss=0.1319, cr_loss=0.3535, over 16044.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.12, cr_loss=0.3381, over 3357591.09 frames. ], batch size: 74, lr: 2.52e-03, grad_scale: 8.0 2024-09-25 22:42:43,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=847359.3333333334, ans=0.0 2024-09-25 22:42:46,751 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.178e+02 1.289e+02 1.363e+02 1.471e+02 2.156e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-25 22:42:47,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=847359.3333333334, ans=0.2 2024-09-25 22:43:02,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=847406.0, ans=0.1 2024-09-25 22:43:11,412 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=22.5 2024-09-25 22:43:44,377 INFO [train.py:1198] (3/4) Epoch 47, batch 2400, loss[loss=0.189, ctc_loss=0.1217, cr_loss=0.3364, over 17163.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1196, cr_loss=0.3374, over 3366338.42 frames. ], batch size: 45, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:43:51,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=847546.0, ans=0.0 2024-09-25 22:43:52,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=847546.0, ans=0.0 2024-09-25 22:44:23,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=847639.3333333334, ans=0.2 2024-09-25 22:44:57,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=847732.6666666666, ans=0.0 2024-09-25 22:45:04,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=847732.6666666666, ans=0.0 2024-09-25 22:45:07,170 INFO [train.py:1198] (3/4) Epoch 47, batch 2450, loss[loss=0.1845, ctc_loss=0.1188, cr_loss=0.3285, over 16951.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1191, cr_loss=0.336, over 3370079.36 frames. ], batch size: 42, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:45:32,661 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.309e+02 1.414e+02 1.505e+02 2.738e+02, threshold=2.828e+02, percent-clipped=1.0 2024-09-25 22:45:47,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=847872.6666666666, ans=0.125 2024-09-25 22:46:26,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=848012.6666666666, ans=0.1 2024-09-25 22:46:27,294 INFO [train.py:1198] (3/4) Epoch 47, batch 2500, loss[loss=0.2026, ctc_loss=0.1307, cr_loss=0.3592, over 17021.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1196, cr_loss=0.337, over 3375198.61 frames. ], batch size: 51, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:46:49,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=848059.3333333334, ans=0.07 2024-09-25 22:47:04,945 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 22:47:55,381 INFO [train.py:1198] (3/4) Epoch 47, batch 2550, loss[loss=0.1946, ctc_loss=0.1247, cr_loss=0.3494, over 17155.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1198, cr_loss=0.3377, over 3373861.41 frames. ], batch size: 45, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:48:03,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=848246.0, ans=0.2 2024-09-25 22:48:20,934 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.321e+02 1.406e+02 1.538e+02 2.240e+02, threshold=2.812e+02, percent-clipped=0.0 2024-09-25 22:48:22,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=848292.6666666666, ans=0.0 2024-09-25 22:48:34,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=848339.3333333334, ans=0.125 2024-09-25 22:48:40,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=848339.3333333334, ans=0.0 2024-09-25 22:48:41,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.66 vs. limit=12.0 2024-09-25 22:48:58,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=848432.6666666666, ans=0.0 2024-09-25 22:49:03,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=848432.6666666666, ans=0.025 2024-09-25 22:49:14,736 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2024-09-25 22:49:15,636 INFO [train.py:1198] (3/4) Epoch 47, batch 2600, loss[loss=0.1438, ctc_loss=0.08872, cr_loss=0.2756, over 16942.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1199, cr_loss=0.3375, over 3369001.99 frames. ], batch size: 42, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:49:16,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=22.5 2024-09-25 22:49:24,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.73 vs. limit=15.0 2024-09-25 22:49:41,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=848526.0, ans=0.05 2024-09-25 22:50:01,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=848572.6666666666, ans=0.125 2024-09-25 22:50:38,210 INFO [train.py:1198] (3/4) Epoch 47, batch 2650, loss[loss=0.1927, ctc_loss=0.1266, cr_loss=0.3305, over 17158.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1193, cr_loss=0.3361, over 3362206.81 frames. ], batch size: 48, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:50:43,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=848712.6666666666, ans=0.1 2024-09-25 22:50:59,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=848759.3333333334, ans=0.025 2024-09-25 22:51:03,873 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.172e+02 1.306e+02 1.404e+02 1.475e+02 2.254e+02, threshold=2.808e+02, percent-clipped=0.0 2024-09-25 22:51:08,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=848806.0, ans=0.125 2024-09-25 22:52:03,699 INFO [train.py:1198] (3/4) Epoch 47, batch 2700, loss[loss=0.1588, ctc_loss=0.1, cr_loss=0.2938, over 17177.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1188, cr_loss=0.3354, over 3360156.60 frames. ], batch size: 41, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:52:10,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=848946.0, ans=0.2 2024-09-25 22:52:29,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=848992.6666666666, ans=0.125 2024-09-25 22:53:02,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=849086.0, ans=0.125 2024-09-25 22:53:25,685 INFO [train.py:1198] (3/4) Epoch 47, batch 2750, loss[loss=0.2017, ctc_loss=0.1312, cr_loss=0.3521, over 15849.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1189, cr_loss=0.3356, over 3356761.60 frames. ], batch size: 74, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:53:29,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=849179.3333333334, ans=0.2 2024-09-25 22:53:32,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=849179.3333333334, ans=0.0 2024-09-25 22:53:37,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=849179.3333333334, ans=0.025 2024-09-25 22:53:38,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=849179.3333333334, ans=0.07 2024-09-25 22:53:51,198 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.348e+02 1.413e+02 1.514e+02 2.410e+02, threshold=2.827e+02, percent-clipped=0.0 2024-09-25 22:53:56,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=849272.6666666666, ans=0.125 2024-09-25 22:53:56,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=849272.6666666666, ans=0.2 2024-09-25 22:54:10,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=849272.6666666666, ans=0.0 2024-09-25 22:54:26,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=849319.3333333334, ans=0.07 2024-09-25 22:54:41,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=849366.0, ans=0.1 2024-09-25 22:54:45,977 INFO [train.py:1198] (3/4) Epoch 47, batch 2800, loss[loss=0.1832, ctc_loss=0.1167, cr_loss=0.3324, over 16979.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1181, cr_loss=0.3343, over 3359712.00 frames. ], batch size: 53, lr: 2.51e-03, grad_scale: 32.0 2024-09-25 22:55:00,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=849412.6666666666, ans=0.0 2024-09-25 22:55:22,418 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=12.0 2024-09-25 22:56:02,624 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 22:56:08,517 INFO [train.py:1198] (3/4) Epoch 47, batch 2850, loss[loss=0.1856, ctc_loss=0.1199, cr_loss=0.3288, over 17215.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.1181, cr_loss=0.3339, over 3364226.39 frames. ], batch size: 47, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 22:56:08,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=849646.0, ans=0.125 2024-09-25 22:56:33,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=849692.6666666666, ans=0.125 2024-09-25 22:56:33,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=849692.6666666666, ans=0.0 2024-09-25 22:56:38,076 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.322e+02 1.395e+02 1.477e+02 2.122e+02, threshold=2.790e+02, percent-clipped=0.0 2024-09-25 22:56:41,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=849739.3333333334, ans=0.125 2024-09-25 22:57:06,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=849786.0, ans=0.0 2024-09-25 22:57:13,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2024-09-25 22:57:33,016 INFO [train.py:1198] (3/4) Epoch 47, batch 2900, loss[loss=0.187, ctc_loss=0.1159, cr_loss=0.3556, over 17231.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.1182, cr_loss=0.3336, over 3362772.40 frames. ], batch size: 47, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 22:57:33,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=849879.3333333334, ans=0.025 2024-09-25 22:57:36,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=849879.3333333334, ans=0.125 2024-09-25 22:57:45,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=849879.3333333334, ans=0.0 2024-09-25 22:57:53,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=849926.0, ans=0.05 2024-09-25 22:58:00,434 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.27 vs. limit=10.0 2024-09-25 22:58:28,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=850019.3333333334, ans=0.125 2024-09-25 22:58:55,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-09-25 22:58:56,182 INFO [train.py:1198] (3/4) Epoch 47, batch 2950, loss[loss=0.17, ctc_loss=0.1083, cr_loss=0.3084, over 16954.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1189, cr_loss=0.3349, over 3367601.87 frames. ], batch size: 42, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 22:58:58,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2024-09-25 22:59:09,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=850112.6666666666, ans=0.125 2024-09-25 22:59:21,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2024-09-25 22:59:24,952 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.302e+02 1.410e+02 1.499e+02 2.139e+02, threshold=2.820e+02, percent-clipped=0.0 2024-09-25 22:59:27,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=850206.0, ans=0.125 2024-09-25 22:59:30,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=850206.0, ans=0.125 2024-09-25 22:59:35,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2024-09-25 23:00:13,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=850299.3333333334, ans=10.0 2024-09-25 23:00:17,882 INFO [train.py:1198] (3/4) Epoch 47, batch 3000, loss[loss=0.1776, ctc_loss=0.1135, cr_loss=0.3205, over 17299.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1193, cr_loss=0.3358, over 3364266.14 frames. ], batch size: 46, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:00:17,882 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 23:00:26,650 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7009, 3.6258, 3.4931, 2.9255], device='cuda:3') 2024-09-25 23:00:26,715 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.5981, 3.6626, 3.0631, 3.2906], device='cuda:3') 2024-09-25 23:00:33,699 INFO [train.py:1230] (3/4) Epoch 47, validation: loss=0.0348, ctc_loss=0.0348, cr_loss=1.036e-14, over 944034.00 frames. 2024-09-25 23:00:33,699 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 23:00:45,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=850346.0, ans=0.0 2024-09-25 23:01:07,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=850439.3333333334, ans=0.125 2024-09-25 23:01:12,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=850439.3333333334, ans=0.125 2024-09-25 23:01:32,873 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2024-09-25 23:01:52,183 INFO [train.py:1198] (3/4) Epoch 47, batch 3050, loss[loss=0.1687, ctc_loss=0.1059, cr_loss=0.3141, over 17314.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1196, cr_loss=0.3363, over 3369648.64 frames. ], batch size: 46, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:02:00,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=15.0 2024-09-25 23:02:15,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=15.0 2024-09-25 23:02:20,305 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.194e+02 1.305e+02 1.413e+02 1.506e+02 4.087e+02, threshold=2.825e+02, percent-clipped=1.0 2024-09-25 23:02:30,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850672.6666666666, ans=0.1 2024-09-25 23:02:31,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=850672.6666666666, ans=0.125 2024-09-25 23:02:31,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=850672.6666666666, ans=0.1 2024-09-25 23:02:49,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=850719.3333333334, ans=0.125 2024-09-25 23:02:56,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=850766.0, ans=0.0 2024-09-25 23:03:07,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.84 vs. limit=10.0 2024-09-25 23:03:13,094 INFO [train.py:1198] (3/4) Epoch 47, batch 3100, loss[loss=0.1888, ctc_loss=0.1186, cr_loss=0.351, over 17077.00 frames. ], tot_loss[loss=0.1867, ctc_loss=0.1193, cr_loss=0.3367, over 3363839.54 frames. ], batch size: 43, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:03:25,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=850812.6666666666, ans=0.125 2024-09-25 23:03:42,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=850859.3333333334, ans=0.125 2024-09-25 23:03:43,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850859.3333333334, ans=0.1 2024-09-25 23:03:45,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850906.0, ans=0.1 2024-09-25 23:03:51,419 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 23:03:55,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=850906.0, ans=0.125 2024-09-25 23:04:00,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=850952.6666666666, ans=0.125 2024-09-25 23:04:13,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=850952.6666666666, ans=0.125 2024-09-25 23:04:13,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=850952.6666666666, ans=0.125 2024-09-25 23:04:17,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2024-09-25 23:04:33,495 INFO [train.py:1198] (3/4) Epoch 47, batch 3150, loss[loss=0.1675, ctc_loss=0.1047, cr_loss=0.3144, over 16932.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1189, cr_loss=0.3363, over 3372243.26 frames. ], batch size: 42, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:04:38,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=851046.0, ans=0.125 2024-09-25 23:05:01,410 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.375e+02 1.455e+02 1.638e+02 2.123e+02, threshold=2.910e+02, percent-clipped=0.0 2024-09-25 23:05:03,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=851139.3333333334, ans=0.1 2024-09-25 23:05:04,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=851139.3333333334, ans=0.2 2024-09-25 23:05:17,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=851139.3333333334, ans=0.1 2024-09-25 23:05:18,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=851186.0, ans=0.0 2024-09-25 23:05:22,440 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=12.0 2024-09-25 23:05:25,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=851186.0, ans=0.025 2024-09-25 23:05:25,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.90 vs. limit=12.0 2024-09-25 23:05:31,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=851186.0, ans=0.125 2024-09-25 23:05:36,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=851232.6666666666, ans=0.1 2024-09-25 23:05:38,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=851232.6666666666, ans=0.025 2024-09-25 23:05:54,053 INFO [train.py:1198] (3/4) Epoch 47, batch 3200, loss[loss=0.2132, ctc_loss=0.1352, cr_loss=0.39, over 17015.00 frames. ], tot_loss[loss=0.1867, ctc_loss=0.1193, cr_loss=0.3372, over 3373620.77 frames. ], batch size: 52, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:06:12,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=22.5 2024-09-25 23:06:23,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2024-09-25 23:06:43,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=851419.3333333334, ans=0.125 2024-09-25 23:07:12,128 INFO [train.py:1198] (3/4) Epoch 47, batch 3250, loss[loss=0.1995, ctc_loss=0.1269, cr_loss=0.363, over 17163.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.119, cr_loss=0.3364, over 3366940.33 frames. ], batch size: 45, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:07:15,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=851512.6666666666, ans=0.125 2024-09-25 23:07:15,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=851512.6666666666, ans=0.2 2024-09-25 23:07:20,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=851512.6666666666, ans=0.125 2024-09-25 23:07:21,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=851512.6666666666, ans=0.125 2024-09-25 23:07:40,357 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.291e+02 1.374e+02 1.460e+02 1.837e+02, threshold=2.749e+02, percent-clipped=0.0 2024-09-25 23:07:53,627 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=12.0 2024-09-25 23:07:54,660 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 23:08:30,209 INFO [train.py:1198] (3/4) Epoch 47, batch 3300, loss[loss=0.1583, ctc_loss=0.09807, cr_loss=0.3012, over 17280.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1188, cr_loss=0.336, over 3373838.48 frames. ], batch size: 42, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:08:41,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=851746.0, ans=0.125 2024-09-25 23:09:12,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=851839.3333333334, ans=0.125 2024-09-25 23:09:48,393 INFO [train.py:1198] (3/4) Epoch 47, batch 3350, loss[loss=0.1835, ctc_loss=0.1188, cr_loss=0.3233, over 17162.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1189, cr_loss=0.3356, over 3365333.08 frames. ], batch size: 48, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:09:56,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=851979.3333333334, ans=0.125 2024-09-25 23:10:11,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=852026.0, ans=0.0 2024-09-25 23:10:20,046 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.314e+02 1.397e+02 1.497e+02 2.232e+02, threshold=2.793e+02, percent-clipped=0.0 2024-09-25 23:10:31,271 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 23:10:33,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.45 vs. limit=10.0 2024-09-25 23:11:04,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=852166.0, ans=0.035 2024-09-25 23:11:04,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=852166.0, ans=0.125 2024-09-25 23:11:08,898 INFO [train.py:1198] (3/4) Epoch 47, batch 3400, loss[loss=0.1822, ctc_loss=0.1158, cr_loss=0.3319, over 17216.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1188, cr_loss=0.3358, over 3364833.05 frames. ], batch size: 47, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:11:28,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=852259.3333333334, ans=0.125 2024-09-25 23:11:57,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=12.0 2024-09-25 23:12:14,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=852399.3333333334, ans=0.2 2024-09-25 23:12:17,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=852399.3333333334, ans=0.0 2024-09-25 23:12:22,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=852399.3333333334, ans=0.125 2024-09-25 23:12:27,294 INFO [train.py:1198] (3/4) Epoch 47, batch 3450, loss[loss=0.1662, ctc_loss=0.1045, cr_loss=0.3082, over 17250.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1186, cr_loss=0.3354, over 3362676.11 frames. ], batch size: 44, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:12:30,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=852446.0, ans=0.125 2024-09-25 23:12:56,967 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.319e+02 1.407e+02 1.479e+02 2.069e+02, threshold=2.813e+02, percent-clipped=0.0 2024-09-25 23:12:57,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.26 vs. limit=15.0 2024-09-25 23:12:58,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=852539.3333333334, ans=0.125 2024-09-25 23:13:11,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=852539.3333333334, ans=0.125 2024-09-25 23:13:23,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.31 vs. limit=12.0 2024-09-25 23:13:49,281 INFO [train.py:1198] (3/4) Epoch 47, batch 3500, loss[loss=0.2071, ctc_loss=0.1332, cr_loss=0.3698, over 17292.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1195, cr_loss=0.3369, over 3361842.96 frames. ], batch size: 51, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:13:52,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=852679.3333333334, ans=0.125 2024-09-25 23:14:08,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=852726.0, ans=0.0 2024-09-25 23:14:12,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=852726.0, ans=0.1 2024-09-25 23:14:17,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=852726.0, ans=0.0 2024-09-25 23:14:24,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=852772.6666666666, ans=0.0 2024-09-25 23:14:31,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852772.6666666666, ans=0.1 2024-09-25 23:14:56,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=852866.0, ans=0.125 2024-09-25 23:14:56,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=852866.0, ans=0.125 2024-09-25 23:15:00,064 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2024-09-25 23:15:02,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=852866.0, ans=0.035 2024-09-25 23:15:07,127 INFO [train.py:1198] (3/4) Epoch 47, batch 3550, loss[loss=0.1602, ctc_loss=0.1011, cr_loss=0.2953, over 17104.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1195, cr_loss=0.3368, over 3350403.62 frames. ], batch size: 43, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:15:10,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=852912.6666666666, ans=0.125 2024-09-25 23:15:19,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=852912.6666666666, ans=0.125 2024-09-25 23:15:31,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=852959.3333333334, ans=0.0 2024-09-25 23:15:38,866 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.176e+02 1.284e+02 1.338e+02 1.473e+02 3.313e+02, threshold=2.677e+02, percent-clipped=1.0 2024-09-25 23:15:42,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=853006.0, ans=0.125 2024-09-25 23:15:43,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=853006.0, ans=0.125 2024-09-25 23:15:45,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=853006.0, ans=0.0 2024-09-25 23:16:03,620 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-09-25 23:16:09,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=853052.6666666666, ans=0.0 2024-09-25 23:16:19,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=853099.3333333334, ans=0.125 2024-09-25 23:16:23,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2024-09-25 23:16:27,426 INFO [train.py:1198] (3/4) Epoch 47, batch 3600, loss[loss=0.1815, ctc_loss=0.1184, cr_loss=0.3152, over 16682.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1198, cr_loss=0.3372, over 3356819.29 frames. ], batch size: 61, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:16:29,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=853146.0, ans=0.125 2024-09-25 23:16:47,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=853192.6666666666, ans=0.125 2024-09-25 23:17:14,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=853286.0, ans=0.125 2024-09-25 23:17:20,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=853286.0, ans=0.125 2024-09-25 23:17:25,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=853286.0, ans=0.2 2024-09-25 23:17:37,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=853332.6666666666, ans=0.025 2024-09-25 23:17:45,127 INFO [train.py:1198] (3/4) Epoch 47, batch 3650, loss[loss=0.1835, ctc_loss=0.1177, cr_loss=0.3289, over 17246.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1196, cr_loss=0.337, over 3349303.09 frames. ], batch size: 44, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:17:52,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2024-09-25 23:17:53,747 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.58 vs. limit=15.0 2024-09-25 23:17:57,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=853379.3333333334, ans=0.025 2024-09-25 23:18:14,972 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.311e+02 1.426e+02 1.502e+02 2.686e+02, threshold=2.852e+02, percent-clipped=1.0 2024-09-25 23:18:43,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.39 vs. limit=10.0 2024-09-25 23:19:04,233 INFO [train.py:1198] (3/4) Epoch 47, batch 3700, loss[loss=0.1788, ctc_loss=0.1139, cr_loss=0.3247, over 17025.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1195, cr_loss=0.3364, over 3360833.15 frames. ], batch size: 51, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:19:19,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2024-09-25 23:19:24,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2024-09-25 23:19:29,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853659.3333333334, ans=0.1 2024-09-25 23:19:35,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=853706.0, ans=0.0 2024-09-25 23:19:36,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=853706.0, ans=0.125 2024-09-25 23:19:44,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=853706.0, ans=0.2 2024-09-25 23:20:23,764 INFO [train.py:1198] (3/4) Epoch 47, batch 3750, loss[loss=0.1538, ctc_loss=0.09509, cr_loss=0.2933, over 16770.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1197, cr_loss=0.3366, over 3344474.61 frames. ], batch size: 37, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:20:49,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=853892.6666666666, ans=0.125 2024-09-25 23:20:53,580 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.332e+02 1.393e+02 1.534e+02 1.821e+02, threshold=2.786e+02, percent-clipped=0.0 2024-09-25 23:21:14,698 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 23:21:16,943 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=12.0 2024-09-25 23:21:23,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2024-09-25 23:21:41,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=854079.3333333334, ans=0.2 2024-09-25 23:21:43,157 INFO [train.py:1198] (3/4) Epoch 47, batch 3800, loss[loss=0.1924, ctc_loss=0.1233, cr_loss=0.3453, over 14991.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1189, cr_loss=0.3354, over 3348896.26 frames. ], batch size: 89, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:21:48,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=854079.3333333334, ans=0.125 2024-09-25 23:21:56,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=854079.3333333334, ans=12.0 2024-09-25 23:22:00,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=854126.0, ans=0.125 2024-09-25 23:22:07,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=854126.0, ans=0.0 2024-09-25 23:22:10,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=854126.0, ans=0.0 2024-09-25 23:22:10,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=854126.0, ans=0.125 2024-09-25 23:22:21,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=854172.6666666666, ans=0.0 2024-09-25 23:23:03,555 INFO [train.py:1198] (3/4) Epoch 47, batch 3850, loss[loss=0.2075, ctc_loss=0.1387, cr_loss=0.3437, over 11859.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1186, cr_loss=0.3338, over 3314493.56 frames. ], batch size: 123, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:23:10,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=854312.6666666666, ans=0.025 2024-09-25 23:23:10,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=854312.6666666666, ans=0.0 2024-09-25 23:23:32,701 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.338e+02 1.449e+02 1.615e+02 3.869e+02, threshold=2.899e+02, percent-clipped=2.0 2024-09-25 23:23:42,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=854406.0, ans=0.125 2024-09-25 23:23:58,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=854452.6666666666, ans=0.2 2024-09-25 23:24:58,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=854527.3333333334, ans=0.125 2024-09-25 23:25:01,184 INFO [train.py:1198] (3/4) Epoch 48, batch 0, loss[loss=0.1874, ctc_loss=0.1181, cr_loss=0.3465, over 17211.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1181, cr_loss=0.3465, over 17211.00 frames. ], batch size: 47, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:25:01,185 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-25 23:25:16,464 INFO [train.py:1230] (3/4) Epoch 48, validation: loss=0.0347, ctc_loss=0.0347, cr_loss=1.045e-14, over 944034.00 frames. 2024-09-25 23:25:16,465 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-25 23:25:41,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2024-09-25 23:25:52,992 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=15.0 2024-09-25 23:26:36,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.04 vs. limit=6.0 2024-09-25 23:26:38,952 INFO [train.py:1198] (3/4) Epoch 48, batch 50, loss[loss=0.1825, ctc_loss=0.1159, cr_loss=0.3332, over 17349.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1191, cr_loss=0.3386, over 764192.08 frames. ], batch size: 48, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:26:53,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=854807.3333333334, ans=0.0 2024-09-25 23:27:16,938 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.326e+02 1.422e+02 1.657e+02 2.406e+02, threshold=2.844e+02, percent-clipped=0.0 2024-09-25 23:27:31,700 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 23:27:43,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=854947.3333333334, ans=0.0 2024-09-25 23:27:46,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=854947.3333333334, ans=0.125 2024-09-25 23:27:51,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=854947.3333333334, ans=0.0 2024-09-25 23:27:59,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=854947.3333333334, ans=10.0 2024-09-25 23:28:02,164 INFO [train.py:1198] (3/4) Epoch 48, batch 100, loss[loss=0.1562, ctc_loss=0.09865, cr_loss=0.2876, over 17169.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.1178, cr_loss=0.3354, over 1331192.54 frames. ], batch size: 41, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:28:08,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=854994.0, ans=0.025 2024-09-25 23:28:14,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-25 23:28:21,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=855040.6666666666, ans=0.1 2024-09-25 23:28:34,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=855087.3333333334, ans=0.125 2024-09-25 23:29:04,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=855134.0, ans=0.125 2024-09-25 23:29:12,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=855180.6666666666, ans=0.2 2024-09-25 23:29:13,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.71 vs. limit=6.0 2024-09-25 23:29:17,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=855180.6666666666, ans=0.0 2024-09-25 23:29:25,172 INFO [train.py:1198] (3/4) Epoch 48, batch 150, loss[loss=0.1664, ctc_loss=0.1039, cr_loss=0.3124, over 17268.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1183, cr_loss=0.3354, over 1781857.60 frames. ], batch size: 42, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:30:05,999 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.318e+02 1.406e+02 1.496e+02 3.141e+02, threshold=2.812e+02, percent-clipped=1.0 2024-09-25 23:30:25,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=855367.3333333334, ans=0.1 2024-09-25 23:30:47,888 INFO [train.py:1198] (3/4) Epoch 48, batch 200, loss[loss=0.1986, ctc_loss=0.1274, cr_loss=0.3559, over 16528.00 frames. ], tot_loss[loss=0.1864, ctc_loss=0.1191, cr_loss=0.3366, over 2127221.12 frames. ], batch size: 66, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:31:08,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=855507.3333333334, ans=0.125 2024-09-25 23:31:08,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=855507.3333333334, ans=0.125 2024-09-25 23:31:21,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=855554.0, ans=0.1 2024-09-25 23:31:21,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=855554.0, ans=0.0 2024-09-25 23:31:53,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=22.5 2024-09-25 23:31:58,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=855647.3333333334, ans=0.1 2024-09-25 23:31:59,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=855647.3333333334, ans=0.2 2024-09-25 23:32:09,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=855694.0, ans=0.0 2024-09-25 23:32:10,689 INFO [train.py:1198] (3/4) Epoch 48, batch 250, loss[loss=0.2321, ctc_loss=0.153, cr_loss=0.3955, over 16547.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1183, cr_loss=0.335, over 2406400.98 frames. ], batch size: 66, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:32:20,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=855694.0, ans=0.125 2024-09-25 23:32:36,581 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 23:32:49,018 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.307e+02 1.374e+02 1.472e+02 2.103e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-25 23:32:55,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=855787.3333333334, ans=0.025 2024-09-25 23:33:18,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=855880.6666666666, ans=0.1 2024-09-25 23:33:33,797 INFO [train.py:1198] (3/4) Epoch 48, batch 300, loss[loss=0.1923, ctc_loss=0.1224, cr_loss=0.3496, over 17213.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1186, cr_loss=0.3361, over 2621730.97 frames. ], batch size: 47, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:33:35,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=855927.3333333334, ans=0.0 2024-09-25 23:34:09,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=15.0 2024-09-25 23:34:21,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=856020.6666666666, ans=0.0 2024-09-25 23:34:34,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=856067.3333333334, ans=0.125 2024-09-25 23:34:34,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=856067.3333333334, ans=0.0 2024-09-25 23:34:50,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=856114.0, ans=0.125 2024-09-25 23:34:58,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2024-09-25 23:34:59,420 INFO [train.py:1198] (3/4) Epoch 48, batch 350, loss[loss=0.2051, ctc_loss=0.1365, cr_loss=0.343, over 17024.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.1179, cr_loss=0.3346, over 2791894.23 frames. ], batch size: 56, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:35:06,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=856160.6666666666, ans=0.125 2024-09-25 23:35:10,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=856160.6666666666, ans=0.0 2024-09-25 23:35:18,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=856207.3333333334, ans=0.125 2024-09-25 23:35:21,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=856207.3333333334, ans=0.0 2024-09-25 23:35:37,610 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.298e+02 1.363e+02 1.438e+02 2.006e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-25 23:35:37,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=856254.0, ans=0.125 2024-09-25 23:36:12,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.76 vs. limit=15.0 2024-09-25 23:36:22,267 INFO [train.py:1198] (3/4) Epoch 48, batch 400, loss[loss=0.173, ctc_loss=0.1099, cr_loss=0.3159, over 17013.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1181, cr_loss=0.3345, over 2927841.26 frames. ], batch size: 51, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:36:30,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=856394.0, ans=0.0 2024-09-25 23:36:39,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2024-09-25 23:36:54,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=856487.3333333334, ans=0.125 2024-09-25 23:36:58,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=856487.3333333334, ans=0.0 2024-09-25 23:37:13,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=856534.0, ans=0.125 2024-09-25 23:37:36,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=856580.6666666666, ans=0.025 2024-09-25 23:37:42,216 INFO [train.py:1198] (3/4) Epoch 48, batch 450, loss[loss=0.1958, ctc_loss=0.1227, cr_loss=0.3657, over 17304.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.118, cr_loss=0.3348, over 3027646.37 frames. ], batch size: 46, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:37:51,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-25 23:38:00,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.83 vs. limit=10.0 2024-09-25 23:38:14,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=856674.0, ans=0.0 2024-09-25 23:38:19,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=856720.6666666666, ans=0.09899494936611666 2024-09-25 23:38:23,496 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.314e+02 1.437e+02 1.529e+02 1.990e+02, threshold=2.873e+02, percent-clipped=0.0 2024-09-25 23:39:07,893 INFO [train.py:1198] (3/4) Epoch 48, batch 500, loss[loss=0.2116, ctc_loss=0.1361, cr_loss=0.3777, over 17208.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1186, cr_loss=0.3358, over 3100330.59 frames. ], batch size: 47, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:39:15,319 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2024-09-25 23:39:21,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=856860.6666666666, ans=0.125 2024-09-25 23:39:31,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=856907.3333333334, ans=0.125 2024-09-25 23:40:00,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=857000.6666666666, ans=0.0 2024-09-25 23:40:19,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=857047.3333333334, ans=0.125 2024-09-25 23:40:30,796 INFO [train.py:1198] (3/4) Epoch 48, batch 550, loss[loss=0.1688, ctc_loss=0.1071, cr_loss=0.3084, over 17297.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1187, cr_loss=0.3349, over 3154970.35 frames. ], batch size: 46, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:41:11,883 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.319e+02 1.428e+02 1.533e+02 1.901e+02, threshold=2.856e+02, percent-clipped=0.0 2024-09-25 23:41:17,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=857187.3333333334, ans=0.125 2024-09-25 23:41:30,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=857234.0, ans=0.1 2024-09-25 23:41:52,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=857327.3333333334, ans=0.025 2024-09-25 23:41:53,738 INFO [train.py:1198] (3/4) Epoch 48, batch 600, loss[loss=0.1979, ctc_loss=0.1286, cr_loss=0.3464, over 17219.00 frames. ], tot_loss[loss=0.1861, ctc_loss=0.119, cr_loss=0.3357, over 3204115.00 frames. ], batch size: 47, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:42:03,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=857327.3333333334, ans=0.125 2024-09-25 23:42:17,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=857374.0, ans=0.125 2024-09-25 23:42:22,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=857374.0, ans=0.0 2024-09-25 23:42:28,040 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2024-09-25 23:42:45,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=857467.3333333334, ans=0.125 2024-09-25 23:42:50,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=857467.3333333334, ans=0.125 2024-09-25 23:43:16,492 INFO [train.py:1198] (3/4) Epoch 48, batch 650, loss[loss=0.1601, ctc_loss=0.09873, cr_loss=0.3066, over 17103.00 frames. ], tot_loss[loss=0.1861, ctc_loss=0.1189, cr_loss=0.336, over 3239542.46 frames. ], batch size: 40, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:43:39,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=857607.3333333334, ans=0.2 2024-09-25 23:43:44,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=857607.3333333334, ans=15.0 2024-09-25 23:43:47,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=857654.0, ans=0.0 2024-09-25 23:43:47,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=15.0 2024-09-25 23:43:48,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=857654.0, ans=0.0 2024-09-25 23:43:59,399 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.322e+02 1.414e+02 1.494e+02 2.236e+02, threshold=2.829e+02, percent-clipped=0.0 2024-09-25 23:44:02,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=857654.0, ans=0.125 2024-09-25 23:44:25,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2024-09-25 23:44:40,139 INFO [train.py:1198] (3/4) Epoch 48, batch 700, loss[loss=0.2092, ctc_loss=0.1334, cr_loss=0.3791, over 17054.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1191, cr_loss=0.3362, over 3268095.26 frames. ], batch size: 46, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:44:57,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=857840.6666666666, ans=0.125 2024-09-25 23:45:21,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=857887.3333333334, ans=0.0 2024-09-25 23:45:40,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=857934.0, ans=0.0 2024-09-25 23:45:58,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=857980.6666666666, ans=0.125 2024-09-25 23:45:58,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=857980.6666666666, ans=0.0 2024-09-25 23:45:59,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=857980.6666666666, ans=0.09899494936611666 2024-09-25 23:46:05,009 INFO [train.py:1198] (3/4) Epoch 48, batch 750, loss[loss=0.1774, ctc_loss=0.1105, cr_loss=0.3348, over 17286.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.119, cr_loss=0.336, over 3290532.07 frames. ], batch size: 42, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:46:20,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=858074.0, ans=0.025 2024-09-25 23:46:31,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=858074.0, ans=0.125 2024-09-25 23:46:44,963 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.340e+02 1.448e+02 1.538e+02 3.559e+02, threshold=2.895e+02, percent-clipped=1.0 2024-09-25 23:47:07,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=858214.0, ans=0.0 2024-09-25 23:47:23,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=858260.6666666666, ans=0.2 2024-09-25 23:47:25,137 INFO [train.py:1198] (3/4) Epoch 48, batch 800, loss[loss=0.2272, ctc_loss=0.1493, cr_loss=0.3896, over 15058.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1183, cr_loss=0.335, over 3311001.47 frames. ], batch size: 89, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:47:44,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=858307.3333333334, ans=0.125 2024-09-25 23:47:48,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=858307.3333333334, ans=0.0 2024-09-25 23:48:09,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=858354.0, ans=0.125 2024-09-25 23:48:35,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=858447.3333333334, ans=0.0 2024-09-25 23:48:47,867 INFO [train.py:1198] (3/4) Epoch 48, batch 850, loss[loss=0.1802, ctc_loss=0.1151, cr_loss=0.3257, over 17357.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1185, cr_loss=0.3353, over 3319668.61 frames. ], batch size: 48, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:49:03,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=858494.0, ans=0.0 2024-09-25 23:49:03,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=858494.0, ans=0.125 2024-09-25 23:49:12,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=858540.6666666666, ans=0.125 2024-09-25 23:49:30,828 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.305e+02 1.378e+02 1.479e+02 2.667e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 23:49:47,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=858634.0, ans=0.125 2024-09-25 23:50:15,701 INFO [train.py:1198] (3/4) Epoch 48, batch 900, loss[loss=0.2236, ctc_loss=0.1472, cr_loss=0.3819, over 15122.00 frames. ], tot_loss[loss=0.1851, ctc_loss=0.1181, cr_loss=0.3347, over 3331583.73 frames. ], batch size: 89, lr: 2.47e-03, grad_scale: 32.0 2024-09-25 23:50:23,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=858727.3333333334, ans=0.025 2024-09-25 23:50:41,239 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 23:50:47,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=858820.6666666666, ans=0.125 2024-09-25 23:51:07,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=858867.3333333334, ans=0.125 2024-09-25 23:51:11,692 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=15.0 2024-09-25 23:51:37,631 INFO [train.py:1198] (3/4) Epoch 48, batch 950, loss[loss=0.182, ctc_loss=0.1171, cr_loss=0.3244, over 17018.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.1181, cr_loss=0.3338, over 3332710.42 frames. ], batch size: 51, lr: 2.47e-03, grad_scale: 16.0 2024-09-25 23:52:19,389 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.296e+02 1.387e+02 1.508e+02 2.071e+02, threshold=2.775e+02, percent-clipped=0.0 2024-09-25 23:52:19,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=859054.0, ans=0.125 2024-09-25 23:52:41,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=859147.3333333334, ans=0.035 2024-09-25 23:53:00,137 INFO [train.py:1198] (3/4) Epoch 48, batch 1000, loss[loss=0.1766, ctc_loss=0.1147, cr_loss=0.3097, over 16038.00 frames. ], tot_loss[loss=0.1867, ctc_loss=0.1194, cr_loss=0.3361, over 3326851.01 frames. ], batch size: 74, lr: 2.47e-03, grad_scale: 16.0 2024-09-25 23:53:49,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=859334.0, ans=0.125 2024-09-25 23:54:03,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=859334.0, ans=0.125 2024-09-25 23:54:04,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=22.5 2024-09-25 23:54:06,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=859380.6666666666, ans=0.0 2024-09-25 23:54:14,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=859380.6666666666, ans=0.1 2024-09-25 23:54:22,551 INFO [train.py:1198] (3/4) Epoch 48, batch 1050, loss[loss=0.1495, ctc_loss=0.09305, cr_loss=0.2823, over 16952.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.12, cr_loss=0.3374, over 3327515.89 frames. ], batch size: 42, lr: 2.47e-03, grad_scale: 16.0 2024-09-25 23:54:54,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=859474.0, ans=0.2 2024-09-25 23:55:00,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=859520.6666666666, ans=0.2 2024-09-25 23:55:03,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=859520.6666666666, ans=0.125 2024-09-25 23:55:05,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=859520.6666666666, ans=0.125 2024-09-25 23:55:06,816 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.301e+02 1.364e+02 1.474e+02 5.159e+02, threshold=2.728e+02, percent-clipped=1.0 2024-09-25 23:55:29,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=859614.0, ans=0.1 2024-09-25 23:55:29,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=22.5 2024-09-25 23:55:45,304 INFO [train.py:1198] (3/4) Epoch 48, batch 1100, loss[loss=0.2113, ctc_loss=0.1373, cr_loss=0.3703, over 15125.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1203, cr_loss=0.3384, over 3336051.80 frames. ], batch size: 88, lr: 2.47e-03, grad_scale: 16.0 2024-09-25 23:55:52,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2024-09-25 23:56:16,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=22.5 2024-09-25 23:57:04,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=859847.3333333334, ans=0.125 2024-09-25 23:57:07,820 INFO [train.py:1198] (3/4) Epoch 48, batch 1150, loss[loss=0.2086, ctc_loss=0.1351, cr_loss=0.3677, over 17022.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1207, cr_loss=0.339, over 3339336.03 frames. ], batch size: 51, lr: 2.47e-03, grad_scale: 16.0 2024-09-25 23:57:14,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=859894.0, ans=0.125 2024-09-25 23:57:16,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.25 vs. limit=10.0 2024-09-25 23:57:32,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=859940.6666666666, ans=0.125 2024-09-25 23:57:35,675 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2024-09-25 23:57:43,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=859987.3333333334, ans=0.2 2024-09-25 23:57:44,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=859987.3333333334, ans=0.0 2024-09-25 23:57:52,002 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.298e+02 1.394e+02 1.477e+02 1.999e+02, threshold=2.788e+02, percent-clipped=0.0 2024-09-25 23:57:57,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=12.0 2024-09-25 23:58:21,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=12.0 2024-09-25 23:58:25,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=860080.6666666666, ans=0.125 2024-09-25 23:58:30,258 INFO [train.py:1198] (3/4) Epoch 48, batch 1200, loss[loss=0.1913, ctc_loss=0.1229, cr_loss=0.3421, over 17289.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1209, cr_loss=0.3397, over 3340929.95 frames. ], batch size: 49, lr: 2.47e-03, grad_scale: 32.0 2024-09-25 23:58:59,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=860174.0, ans=0.025 2024-09-25 23:59:03,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=860220.6666666666, ans=0.0 2024-09-25 23:59:05,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=860220.6666666666, ans=0.0 2024-09-25 23:59:21,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=860267.3333333334, ans=0.125 2024-09-25 23:59:27,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=860267.3333333334, ans=0.125 2024-09-25 23:59:31,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=860267.3333333334, ans=0.125 2024-09-25 23:59:38,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-09-25 23:59:47,147 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2024-09-25 23:59:53,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860314.0, ans=0.1 2024-09-25 23:59:53,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2024-09-25 23:59:56,069 INFO [train.py:1198] (3/4) Epoch 48, batch 1250, loss[loss=0.1971, ctc_loss=0.1262, cr_loss=0.3542, over 17355.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1195, cr_loss=0.3363, over 3345878.39 frames. ], batch size: 48, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:00:07,430 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:00:08,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=860360.6666666666, ans=0.125 2024-09-26 00:00:28,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860454.0, ans=0.1 2024-09-26 00:00:38,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=860454.0, ans=0.125 2024-09-26 00:00:39,299 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.302e+02 1.385e+02 1.504e+02 2.588e+02, threshold=2.770e+02, percent-clipped=0.0 2024-09-26 00:00:51,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-09-26 00:01:05,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=860547.3333333334, ans=0.0 2024-09-26 00:01:19,897 INFO [train.py:1198] (3/4) Epoch 48, batch 1300, loss[loss=0.1651, ctc_loss=0.1053, cr_loss=0.2992, over 17137.00 frames. ], tot_loss[loss=0.1861, ctc_loss=0.1189, cr_loss=0.3356, over 3353712.48 frames. ], batch size: 40, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:01:20,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=860594.0, ans=0.0 2024-09-26 00:01:26,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=860594.0, ans=0.125 2024-09-26 00:01:43,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=860640.6666666666, ans=0.0 2024-09-26 00:01:57,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=860687.3333333334, ans=0.125 2024-09-26 00:01:59,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=860687.3333333334, ans=0.125 2024-09-26 00:02:09,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=860734.0, ans=0.125 2024-09-26 00:02:32,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=15.0 2024-09-26 00:02:41,004 INFO [train.py:1198] (3/4) Epoch 48, batch 1350, loss[loss=0.2005, ctc_loss=0.1293, cr_loss=0.3557, over 17023.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1198, cr_loss=0.3367, over 3359918.79 frames. ], batch size: 52, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:02:44,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=860827.3333333334, ans=0.2 2024-09-26 00:02:55,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=860827.3333333334, ans=0.0 2024-09-26 00:02:59,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=860874.0, ans=0.0 2024-09-26 00:03:26,899 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.284e+02 1.347e+02 1.442e+02 2.015e+02, threshold=2.695e+02, percent-clipped=0.0 2024-09-26 00:03:30,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=860967.3333333334, ans=0.0 2024-09-26 00:03:47,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2024-09-26 00:04:07,155 INFO [train.py:1198] (3/4) Epoch 48, batch 1400, loss[loss=0.1742, ctc_loss=0.1078, cr_loss=0.3322, over 16948.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1189, cr_loss=0.3344, over 3354227.19 frames. ], batch size: 42, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:04:10,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=861060.6666666666, ans=0.2 2024-09-26 00:04:34,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=861107.3333333334, ans=0.0 2024-09-26 00:05:04,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=861200.6666666666, ans=0.125 2024-09-26 00:05:04,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861200.6666666666, ans=0.1 2024-09-26 00:05:12,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=861247.3333333334, ans=0.0 2024-09-26 00:05:20,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=861247.3333333334, ans=0.2 2024-09-26 00:05:20,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=861247.3333333334, ans=0.0 2024-09-26 00:05:27,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=861247.3333333334, ans=0.125 2024-09-26 00:05:30,166 INFO [train.py:1198] (3/4) Epoch 48, batch 1450, loss[loss=0.1842, ctc_loss=0.1198, cr_loss=0.3217, over 16737.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1186, cr_loss=0.3345, over 3365190.73 frames. ], batch size: 61, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:05:36,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=861294.0, ans=0.125 2024-09-26 00:05:43,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=861294.0, ans=0.0 2024-09-26 00:06:00,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=861387.3333333334, ans=0.125 2024-09-26 00:06:11,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=861387.3333333334, ans=0.125 2024-09-26 00:06:15,971 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.346e+02 1.409e+02 1.541e+02 2.437e+02, threshold=2.817e+02, percent-clipped=0.0 2024-09-26 00:06:22,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=861434.0, ans=0.0 2024-09-26 00:06:26,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=861434.0, ans=0.025 2024-09-26 00:06:52,971 INFO [train.py:1198] (3/4) Epoch 48, batch 1500, loss[loss=0.1493, ctc_loss=0.09297, cr_loss=0.2818, over 17103.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1184, cr_loss=0.3341, over 3368212.39 frames. ], batch size: 40, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:06:56,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=861527.3333333334, ans=0.0 2024-09-26 00:07:16,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=861574.0, ans=10.0 2024-09-26 00:07:17,067 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:08:15,404 INFO [train.py:1198] (3/4) Epoch 48, batch 1550, loss[loss=0.1886, ctc_loss=0.1211, cr_loss=0.3376, over 16950.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1188, cr_loss=0.3357, over 3362770.83 frames. ], batch size: 58, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:08:36,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=861807.3333333334, ans=0.025 2024-09-26 00:08:52,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=861854.0, ans=0.04949747468305833 2024-09-26 00:08:58,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2024-09-26 00:09:00,539 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.301e+02 1.350e+02 1.480e+02 2.232e+02, threshold=2.701e+02, percent-clipped=0.0 2024-09-26 00:09:15,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=861900.6666666666, ans=0.0 2024-09-26 00:09:37,075 INFO [train.py:1198] (3/4) Epoch 48, batch 1600, loss[loss=0.1729, ctc_loss=0.108, cr_loss=0.3243, over 17168.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1183, cr_loss=0.3347, over 3363604.39 frames. ], batch size: 45, lr: 2.47e-03, grad_scale: 32.0 2024-09-26 00:10:08,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2024-09-26 00:10:36,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=862134.0, ans=0.2 2024-09-26 00:10:39,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=862134.0, ans=0.0 2024-09-26 00:11:01,994 INFO [train.py:1198] (3/4) Epoch 48, batch 1650, loss[loss=0.1705, ctc_loss=0.1084, cr_loss=0.3107, over 17154.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1189, cr_loss=0.3355, over 3357114.17 frames. ], batch size: 45, lr: 2.47e-03, grad_scale: 32.0 2024-09-26 00:11:39,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=15.0 2024-09-26 00:11:47,277 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.306e+02 1.368e+02 1.453e+02 1.769e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-26 00:12:05,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=862414.0, ans=0.1 2024-09-26 00:12:07,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=862414.0, ans=0.125 2024-09-26 00:12:19,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=862414.0, ans=0.125 2024-09-26 00:12:22,676 INFO [train.py:1198] (3/4) Epoch 48, batch 1700, loss[loss=0.1833, ctc_loss=0.1169, cr_loss=0.3321, over 17141.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1191, cr_loss=0.336, over 3352876.49 frames. ], batch size: 48, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:12:23,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=862460.6666666666, ans=0.125 2024-09-26 00:12:34,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=22.5 2024-09-26 00:12:39,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2024-09-26 00:12:40,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=862507.3333333334, ans=0.125 2024-09-26 00:12:47,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=862507.3333333334, ans=0.0 2024-09-26 00:12:51,182 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:13:07,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=15.0 2024-09-26 00:13:08,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=862554.0, ans=0.0 2024-09-26 00:13:37,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=862647.3333333334, ans=0.125 2024-09-26 00:13:42,658 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:13:45,464 INFO [train.py:1198] (3/4) Epoch 48, batch 1750, loss[loss=0.1576, ctc_loss=0.09762, cr_loss=0.3001, over 17285.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1195, cr_loss=0.3369, over 3356388.20 frames. ], batch size: 42, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:13:56,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-09-26 00:14:02,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=862740.6666666666, ans=22.5 2024-09-26 00:14:09,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=862740.6666666666, ans=0.1 2024-09-26 00:14:32,513 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.295e+02 1.382e+02 1.510e+02 2.646e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-26 00:14:34,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=862834.0, ans=0.125 2024-09-26 00:15:02,992 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=12.0 2024-09-26 00:15:07,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=862880.6666666666, ans=0.0 2024-09-26 00:15:07,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=862880.6666666666, ans=0.125 2024-09-26 00:15:09,853 INFO [train.py:1198] (3/4) Epoch 48, batch 1800, loss[loss=0.1924, ctc_loss=0.1246, cr_loss=0.3387, over 16843.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1195, cr_loss=0.3373, over 3352801.92 frames. ], batch size: 58, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:16:11,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=863067.3333333334, ans=0.2 2024-09-26 00:16:27,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=863114.0, ans=0.0 2024-09-26 00:16:31,981 INFO [train.py:1198] (3/4) Epoch 48, batch 1850, loss[loss=0.1851, ctc_loss=0.1183, cr_loss=0.334, over 17131.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1204, cr_loss=0.3384, over 3356778.25 frames. ], batch size: 48, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:16:51,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=863207.3333333334, ans=0.125 2024-09-26 00:16:59,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863207.3333333334, ans=0.1 2024-09-26 00:17:02,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=863254.0, ans=0.125 2024-09-26 00:17:16,787 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.322e+02 1.421e+02 1.543e+02 2.119e+02, threshold=2.843e+02, percent-clipped=0.0 2024-09-26 00:17:33,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863300.6666666666, ans=0.1 2024-09-26 00:17:54,933 INFO [train.py:1198] (3/4) Epoch 48, batch 1900, loss[loss=0.2366, ctc_loss=0.1636, cr_loss=0.3651, over 11942.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1201, cr_loss=0.3376, over 3347866.24 frames. ], batch size: 123, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:17:56,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=863394.0, ans=0.2 2024-09-26 00:18:20,919 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:18:25,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=863487.3333333334, ans=0.125 2024-09-26 00:18:33,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=863487.3333333334, ans=0.0 2024-09-26 00:18:55,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=863534.0, ans=0.0 2024-09-26 00:19:14,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=863580.6666666666, ans=0.025 2024-09-26 00:19:17,622 INFO [train.py:1198] (3/4) Epoch 48, batch 1950, loss[loss=0.197, ctc_loss=0.1286, cr_loss=0.3416, over 15985.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1198, cr_loss=0.3369, over 3346505.95 frames. ], batch size: 74, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:20:00,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=863720.6666666666, ans=0.025 2024-09-26 00:20:05,198 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.335e+02 1.400e+02 1.522e+02 2.815e+02, threshold=2.800e+02, percent-clipped=0.0 2024-09-26 00:20:05,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=863720.6666666666, ans=0.0 2024-09-26 00:20:13,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=863767.3333333334, ans=0.0 2024-09-26 00:20:18,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=863767.3333333334, ans=0.125 2024-09-26 00:20:21,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=863767.3333333334, ans=0.0 2024-09-26 00:20:26,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=863814.0, ans=0.125 2024-09-26 00:20:40,169 INFO [train.py:1198] (3/4) Epoch 48, batch 2000, loss[loss=0.1452, ctc_loss=0.08976, cr_loss=0.2772, over 17161.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1197, cr_loss=0.3364, over 3347906.24 frames. ], batch size: 41, lr: 2.47e-03, grad_scale: 32.0 2024-09-26 00:21:35,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=864000.6666666666, ans=0.0 2024-09-26 00:22:02,511 INFO [train.py:1198] (3/4) Epoch 48, batch 2050, loss[loss=0.1753, ctc_loss=0.11, cr_loss=0.3265, over 17169.00 frames. ], tot_loss[loss=0.1861, ctc_loss=0.1191, cr_loss=0.3351, over 3357211.91 frames. ], batch size: 41, lr: 2.47e-03, grad_scale: 32.0 2024-09-26 00:22:51,183 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.300e+02 1.355e+02 1.459e+02 2.186e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-26 00:23:14,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=864280.6666666666, ans=0.1 2024-09-26 00:23:20,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=12.0 2024-09-26 00:23:25,057 INFO [train.py:1198] (3/4) Epoch 48, batch 2100, loss[loss=0.1864, ctc_loss=0.1197, cr_loss=0.3337, over 17092.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1205, cr_loss=0.3378, over 3353218.52 frames. ], batch size: 49, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:23:33,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=864327.3333333334, ans=0.0 2024-09-26 00:24:04,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.72 vs. limit=10.0 2024-09-26 00:24:14,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=864467.3333333334, ans=0.125 2024-09-26 00:24:36,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=864514.0, ans=0.0 2024-09-26 00:24:46,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=864514.0, ans=0.125 2024-09-26 00:24:50,877 INFO [train.py:1198] (3/4) Epoch 48, batch 2150, loss[loss=0.1543, ctc_loss=0.09735, cr_loss=0.2846, over 17009.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1203, cr_loss=0.3369, over 3330041.18 frames. ], batch size: 39, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:25:04,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2024-09-26 00:25:10,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=864607.3333333334, ans=0.2 2024-09-26 00:25:10,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=864607.3333333334, ans=0.125 2024-09-26 00:25:23,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=864654.0, ans=0.125 2024-09-26 00:25:37,468 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.295e+02 1.362e+02 1.467e+02 1.863e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-26 00:25:58,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=864747.3333333334, ans=0.2 2024-09-26 00:26:06,881 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-26 00:26:13,771 INFO [train.py:1198] (3/4) Epoch 48, batch 2200, loss[loss=0.2107, ctc_loss=0.1359, cr_loss=0.3743, over 15797.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1204, cr_loss=0.3375, over 3334053.05 frames. ], batch size: 74, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:26:30,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-09-26 00:26:42,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=864840.6666666666, ans=0.125 2024-09-26 00:26:49,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=864887.3333333334, ans=0.2 2024-09-26 00:26:52,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=864887.3333333334, ans=0.125 2024-09-26 00:27:08,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=864934.0, ans=0.125 2024-09-26 00:27:36,736 INFO [train.py:1198] (3/4) Epoch 48, batch 2250, loss[loss=0.1854, ctc_loss=0.1176, cr_loss=0.3395, over 17058.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1201, cr_loss=0.337, over 3337539.38 frames. ], batch size: 46, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:27:48,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=865027.3333333334, ans=0.125 2024-09-26 00:28:01,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=865074.0, ans=0.0 2024-09-26 00:28:04,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=865074.0, ans=0.1 2024-09-26 00:28:07,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=865120.6666666666, ans=0.125 2024-09-26 00:28:22,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=865120.6666666666, ans=0.125 2024-09-26 00:28:23,492 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.181e+02 1.300e+02 1.376e+02 1.452e+02 2.028e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-26 00:28:28,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=865167.3333333334, ans=0.125 2024-09-26 00:29:00,163 INFO [train.py:1198] (3/4) Epoch 48, batch 2300, loss[loss=0.1558, ctc_loss=0.09679, cr_loss=0.2948, over 16942.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1194, cr_loss=0.3357, over 3352570.65 frames. ], batch size: 42, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:29:07,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2024-09-26 00:29:13,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=865260.6666666666, ans=0.125 2024-09-26 00:29:51,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=865400.6666666666, ans=0.125 2024-09-26 00:30:19,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.70 vs. limit=15.0 2024-09-26 00:30:22,866 INFO [train.py:1198] (3/4) Epoch 48, batch 2350, loss[loss=0.2049, ctc_loss=0.1328, cr_loss=0.3608, over 16031.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1184, cr_loss=0.3343, over 3358213.20 frames. ], batch size: 74, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:30:26,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=865494.0, ans=0.125 2024-09-26 00:30:29,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=865494.0, ans=0.125 2024-09-26 00:30:34,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=865494.0, ans=0.125 2024-09-26 00:30:51,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-09-26 00:31:04,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=865587.3333333334, ans=0.125 2024-09-26 00:31:11,656 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.174e+02 1.301e+02 1.386e+02 1.481e+02 2.528e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-26 00:31:26,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=865634.0, ans=0.09899494936611666 2024-09-26 00:31:34,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=865680.6666666666, ans=0.0 2024-09-26 00:31:44,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.93 vs. limit=10.0 2024-09-26 00:31:44,987 INFO [train.py:1198] (3/4) Epoch 48, batch 2400, loss[loss=0.1874, ctc_loss=0.1195, cr_loss=0.3398, over 17045.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.118, cr_loss=0.3343, over 3368014.41 frames. ], batch size: 39, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:32:48,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=865867.3333333334, ans=0.125 2024-09-26 00:32:48,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=865867.3333333334, ans=0.5 2024-09-26 00:33:07,678 INFO [train.py:1198] (3/4) Epoch 48, batch 2450, loss[loss=0.1974, ctc_loss=0.1268, cr_loss=0.3531, over 17302.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1186, cr_loss=0.3351, over 3354991.61 frames. ], batch size: 46, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:33:37,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=866007.3333333334, ans=0.125 2024-09-26 00:33:58,228 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.328e+02 1.396e+02 1.507e+02 2.681e+02, threshold=2.792e+02, percent-clipped=0.0 2024-09-26 00:34:16,736 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.74 vs. limit=15.0 2024-09-26 00:34:32,989 INFO [train.py:1198] (3/4) Epoch 48, batch 2500, loss[loss=0.1697, ctc_loss=0.1091, cr_loss=0.3029, over 17030.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.119, cr_loss=0.336, over 3359952.19 frames. ], batch size: 44, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:34:33,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=866194.0, ans=0.125 2024-09-26 00:34:38,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=866194.0, ans=0.0 2024-09-26 00:34:39,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=866194.0, ans=0.125 2024-09-26 00:35:06,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=866287.3333333334, ans=0.2 2024-09-26 00:35:56,125 INFO [train.py:1198] (3/4) Epoch 48, batch 2550, loss[loss=0.2071, ctc_loss=0.1316, cr_loss=0.3776, over 17215.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1184, cr_loss=0.3348, over 3363561.77 frames. ], batch size: 55, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:36:31,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=866520.6666666666, ans=0.125 2024-09-26 00:36:33,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=866520.6666666666, ans=0.2 2024-09-26 00:36:34,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=866520.6666666666, ans=0.125 2024-09-26 00:36:43,905 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.316e+02 1.412e+02 1.521e+02 2.388e+02, threshold=2.824e+02, percent-clipped=0.0 2024-09-26 00:36:44,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=866567.3333333334, ans=0.0 2024-09-26 00:36:58,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=866614.0, ans=0.025 2024-09-26 00:37:16,414 INFO [train.py:1198] (3/4) Epoch 48, batch 2600, loss[loss=0.1823, ctc_loss=0.116, cr_loss=0.3315, over 17304.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.1181, cr_loss=0.3337, over 3365731.36 frames. ], batch size: 46, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:37:16,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=866660.6666666666, ans=0.125 2024-09-26 00:37:19,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=866660.6666666666, ans=0.125 2024-09-26 00:37:38,660 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:37:56,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=866754.0, ans=0.0 2024-09-26 00:38:21,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=866847.3333333334, ans=0.125 2024-09-26 00:38:31,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=866847.3333333334, ans=0.125 2024-09-26 00:38:41,786 INFO [train.py:1198] (3/4) Epoch 48, batch 2650, loss[loss=0.1659, ctc_loss=0.1022, cr_loss=0.3188, over 17109.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1186, cr_loss=0.3352, over 3356738.74 frames. ], batch size: 40, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:38:57,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=866940.6666666666, ans=0.125 2024-09-26 00:39:06,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=866940.6666666666, ans=0.0 2024-09-26 00:39:32,301 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.306e+02 1.381e+02 1.471e+02 2.187e+02, threshold=2.762e+02, percent-clipped=0.0 2024-09-26 00:39:35,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=867034.0, ans=0.1 2024-09-26 00:39:39,135 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:39:53,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=867080.6666666666, ans=0.125 2024-09-26 00:40:00,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=867080.6666666666, ans=0.0 2024-09-26 00:40:04,504 INFO [train.py:1198] (3/4) Epoch 48, batch 2700, loss[loss=0.1889, ctc_loss=0.1205, cr_loss=0.3421, over 17369.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1185, cr_loss=0.3348, over 3354558.67 frames. ], batch size: 48, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:40:06,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=867127.3333333334, ans=0.125 2024-09-26 00:40:12,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=867127.3333333334, ans=0.2 2024-09-26 00:40:19,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2024-09-26 00:40:27,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=867174.0, ans=0.0 2024-09-26 00:40:50,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=867220.6666666666, ans=0.125 2024-09-26 00:41:12,525 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-26 00:41:13,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=867314.0, ans=0.0 2024-09-26 00:41:27,334 INFO [train.py:1198] (3/4) Epoch 48, batch 2750, loss[loss=0.1804, ctc_loss=0.1156, cr_loss=0.3244, over 17197.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1186, cr_loss=0.3348, over 3341389.96 frames. ], batch size: 50, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:41:27,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=22.5 2024-09-26 00:41:42,319 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2024-09-26 00:41:48,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=867407.3333333334, ans=0.025 2024-09-26 00:41:50,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2024-09-26 00:42:15,064 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.302e+02 1.400e+02 1.492e+02 2.005e+02, threshold=2.800e+02, percent-clipped=0.0 2024-09-26 00:42:16,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=867500.6666666666, ans=0.0 2024-09-26 00:42:31,313 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=22.5 2024-09-26 00:42:39,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=867547.3333333334, ans=15.0 2024-09-26 00:42:49,904 INFO [train.py:1198] (3/4) Epoch 48, batch 2800, loss[loss=0.1585, ctc_loss=0.0982, cr_loss=0.3014, over 17265.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1187, cr_loss=0.3354, over 3348387.26 frames. ], batch size: 42, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:42:56,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=867594.0, ans=0.025 2024-09-26 00:43:01,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=867594.0, ans=0.2 2024-09-26 00:43:15,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=867640.6666666666, ans=0.125 2024-09-26 00:43:29,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=867687.3333333334, ans=0.125 2024-09-26 00:43:37,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=867687.3333333334, ans=0.1 2024-09-26 00:43:40,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=867734.0, ans=0.1 2024-09-26 00:43:41,117 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-09-26 00:44:00,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=867780.6666666666, ans=22.5 2024-09-26 00:44:12,429 INFO [train.py:1198] (3/4) Epoch 48, batch 2850, loss[loss=0.1563, ctc_loss=0.09971, cr_loss=0.2832, over 17213.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1192, cr_loss=0.3364, over 3352459.47 frames. ], batch size: 41, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:44:12,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=867827.3333333334, ans=0.09899494936611666 2024-09-26 00:44:39,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=867874.0, ans=0.0 2024-09-26 00:45:02,970 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.286e+02 1.342e+02 1.424e+02 1.734e+02, threshold=2.685e+02, percent-clipped=0.0 2024-09-26 00:45:18,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=868014.0, ans=0.125 2024-09-26 00:45:24,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=868014.0, ans=0.0 2024-09-26 00:45:37,912 INFO [train.py:1198] (3/4) Epoch 48, batch 2900, loss[loss=0.1946, ctc_loss=0.1234, cr_loss=0.3563, over 17147.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1189, cr_loss=0.3358, over 3354641.84 frames. ], batch size: 48, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:46:00,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=868107.3333333334, ans=0.0 2024-09-26 00:46:11,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=868154.0, ans=0.125 2024-09-26 00:46:13,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2024-09-26 00:46:19,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=868154.0, ans=0.5 2024-09-26 00:46:32,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=868200.6666666666, ans=0.2 2024-09-26 00:46:53,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=868247.3333333334, ans=0.125 2024-09-26 00:46:57,810 INFO [train.py:1198] (3/4) Epoch 48, batch 2950, loss[loss=0.2097, ctc_loss=0.1364, cr_loss=0.3664, over 17231.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1187, cr_loss=0.3362, over 3360494.65 frames. ], batch size: 50, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:47:00,315 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=22.5 2024-09-26 00:47:29,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=868340.6666666666, ans=0.125 2024-09-26 00:47:50,012 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.284e+02 1.365e+02 1.462e+02 2.440e+02, threshold=2.731e+02, percent-clipped=0.0 2024-09-26 00:47:53,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=868434.0, ans=0.125 2024-09-26 00:48:20,380 INFO [train.py:1198] (3/4) Epoch 48, batch 3000, loss[loss=0.2105, ctc_loss=0.1356, cr_loss=0.3744, over 16938.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1186, cr_loss=0.3358, over 3356196.17 frames. ], batch size: 58, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:48:20,380 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-26 00:48:33,844 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6683, 4.5065, 4.2950, 4.2167], device='cuda:3') 2024-09-26 00:48:38,776 INFO [train.py:1230] (3/4) Epoch 48, validation: loss=0.03527, ctc_loss=0.03527, cr_loss=1.067e-14, over 944034.00 frames. 2024-09-26 00:48:38,777 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-26 00:48:47,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2024-09-26 00:48:58,221 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=22.5 2024-09-26 00:49:00,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=868574.0, ans=0.125 2024-09-26 00:49:57,147 INFO [train.py:1198] (3/4) Epoch 48, batch 3050, loss[loss=0.1337, ctc_loss=0.08281, cr_loss=0.2543, over 16267.00 frames. ], tot_loss[loss=0.1845, ctc_loss=0.1177, cr_loss=0.334, over 3356226.89 frames. ], batch size: 36, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:50:12,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=868807.3333333334, ans=0.125 2024-09-26 00:50:19,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=868807.3333333334, ans=0.125 2024-09-26 00:50:35,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=868854.0, ans=0.0 2024-09-26 00:50:42,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=868854.0, ans=0.0 2024-09-26 00:50:47,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=868900.6666666666, ans=0.0 2024-09-26 00:50:48,357 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.292e+02 1.394e+02 1.474e+02 1.856e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-26 00:51:17,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2024-09-26 00:51:18,423 INFO [train.py:1198] (3/4) Epoch 48, batch 3100, loss[loss=0.1604, ctc_loss=0.1013, cr_loss=0.2957, over 16375.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.1178, cr_loss=0.3348, over 3357407.82 frames. ], batch size: 36, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:51:18,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=868994.0, ans=0.125 2024-09-26 00:51:37,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2024-09-26 00:51:47,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=869087.3333333334, ans=0.0 2024-09-26 00:51:49,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=869087.3333333334, ans=0.0 2024-09-26 00:51:59,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-09-26 00:52:21,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.60 vs. limit=15.0 2024-09-26 00:52:35,872 INFO [train.py:1198] (3/4) Epoch 48, batch 3150, loss[loss=0.2041, ctc_loss=0.1326, cr_loss=0.3578, over 17071.00 frames. ], tot_loss[loss=0.1851, ctc_loss=0.118, cr_loss=0.3353, over 3359848.22 frames. ], batch size: 43, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:52:37,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=869227.3333333334, ans=0.125 2024-09-26 00:53:21,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=869320.6666666666, ans=0.0 2024-09-26 00:53:26,182 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.286e+02 1.377e+02 1.489e+02 2.573e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-26 00:53:29,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=869367.3333333334, ans=0.0 2024-09-26 00:53:34,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=869367.3333333334, ans=0.2 2024-09-26 00:53:53,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=869414.0, ans=0.125 2024-09-26 00:53:55,916 INFO [train.py:1198] (3/4) Epoch 48, batch 3200, loss[loss=0.198, ctc_loss=0.1263, cr_loss=0.3585, over 17349.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.1177, cr_loss=0.3352, over 3367086.97 frames. ], batch size: 48, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:54:26,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=869554.0, ans=0.125 2024-09-26 00:54:38,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=869554.0, ans=0.125 2024-09-26 00:54:39,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2024-09-26 00:54:43,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=869600.6666666666, ans=0.125 2024-09-26 00:55:14,479 INFO [train.py:1198] (3/4) Epoch 48, batch 3250, loss[loss=0.1609, ctc_loss=0.1009, cr_loss=0.3, over 17202.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1185, cr_loss=0.3356, over 3343589.05 frames. ], batch size: 41, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:56:03,196 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.301e+02 1.439e+02 1.548e+02 2.033e+02, threshold=2.879e+02, percent-clipped=0.0 2024-09-26 00:56:33,154 INFO [train.py:1198] (3/4) Epoch 48, batch 3300, loss[loss=0.1803, ctc_loss=0.1136, cr_loss=0.3336, over 17066.00 frames. ], tot_loss[loss=0.1861, ctc_loss=0.119, cr_loss=0.3359, over 3341560.29 frames. ], batch size: 46, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:56:37,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.42 vs. limit=15.0 2024-09-26 00:56:46,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=869927.3333333334, ans=0.0 2024-09-26 00:57:01,029 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:57:05,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=870020.6666666666, ans=0.95 2024-09-26 00:57:54,016 INFO [train.py:1198] (3/4) Epoch 48, batch 3350, loss[loss=0.148, ctc_loss=0.0915, cr_loss=0.2824, over 17095.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1198, cr_loss=0.3371, over 3334612.23 frames. ], batch size: 40, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:58:18,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.36 vs. limit=22.5 2024-09-26 00:58:21,918 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.36 vs. limit=15.0 2024-09-26 00:58:33,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=870254.0, ans=0.0 2024-09-26 00:58:34,252 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.37 vs. limit=15.0 2024-09-26 00:58:42,523 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.313e+02 1.389e+02 1.485e+02 2.642e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-26 00:58:58,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=870347.3333333334, ans=0.125 2024-09-26 00:59:10,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=870394.0, ans=0.125 2024-09-26 00:59:12,294 INFO [train.py:1198] (3/4) Epoch 48, batch 3400, loss[loss=0.1489, ctc_loss=0.09171, cr_loss=0.2857, over 16365.00 frames. ], tot_loss[loss=0.1864, ctc_loss=0.1192, cr_loss=0.3362, over 3347662.37 frames. ], batch size: 36, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:59:28,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=870440.6666666666, ans=0.125 2024-09-26 00:59:28,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-26 00:59:32,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=870440.6666666666, ans=0.1 2024-09-26 00:59:39,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=870440.6666666666, ans=0.125 2024-09-26 01:00:32,485 INFO [train.py:1198] (3/4) Epoch 48, batch 3450, loss[loss=0.185, ctc_loss=0.1177, cr_loss=0.3361, over 17128.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1194, cr_loss=0.3367, over 3342832.04 frames. ], batch size: 48, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 01:00:39,175 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 01:00:40,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=870627.3333333334, ans=0.0 2024-09-26 01:00:47,904 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2024-09-26 01:01:03,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=870674.0, ans=0.0 2024-09-26 01:01:10,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=870720.6666666666, ans=0.125 2024-09-26 01:01:13,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=870720.6666666666, ans=0.2 2024-09-26 01:01:14,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=15.0 2024-09-26 01:01:23,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=870767.3333333334, ans=0.125 2024-09-26 01:01:24,344 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.301e+02 1.395e+02 1.467e+02 2.192e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-26 01:01:52,536 INFO [train.py:1198] (3/4) Epoch 48, batch 3500, loss[loss=0.1799, ctc_loss=0.1125, cr_loss=0.3371, over 17196.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1197, cr_loss=0.3368, over 3348972.30 frames. ], batch size: 45, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 01:01:58,145 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=22.5 2024-09-26 01:02:10,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=870907.3333333334, ans=0.0 2024-09-26 01:02:21,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870907.3333333334, ans=0.1 2024-09-26 01:02:29,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=870954.0, ans=0.1 2024-09-26 01:02:35,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=870954.0, ans=0.1 2024-09-26 01:03:00,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=871047.3333333334, ans=0.125 2024-09-26 01:03:01,206 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-09-26 01:03:06,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=871047.3333333334, ans=0.125 2024-09-26 01:03:10,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=871047.3333333334, ans=0.0 2024-09-26 01:03:12,763 INFO [train.py:1198] (3/4) Epoch 48, batch 3550, loss[loss=0.1907, ctc_loss=0.1209, cr_loss=0.349, over 17007.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1197, cr_loss=0.3365, over 3339110.55 frames. ], batch size: 52, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 01:03:16,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=871094.0, ans=0.1 2024-09-26 01:03:22,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=871094.0, ans=0.0 2024-09-26 01:03:25,858 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=12.0 2024-09-26 01:03:32,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=871140.6666666666, ans=0.0 2024-09-26 01:03:50,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=871187.3333333334, ans=0.125 2024-09-26 01:03:55,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=871187.3333333334, ans=0.0 2024-09-26 01:04:02,352 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.323e+02 1.392e+02 1.491e+02 3.535e+02, threshold=2.785e+02, percent-clipped=1.0 2024-09-26 01:04:16,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=871280.6666666666, ans=0.5 2024-09-26 01:04:27,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=871280.6666666666, ans=0.125 2024-09-26 01:04:30,424 INFO [train.py:1198] (3/4) Epoch 48, batch 3600, loss[loss=0.182, ctc_loss=0.118, cr_loss=0.3204, over 16906.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1199, cr_loss=0.3369, over 3329933.24 frames. ], batch size: 58, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 01:04:30,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=871327.3333333334, ans=0.2 2024-09-26 01:04:35,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=12.0 2024-09-26 01:04:47,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=871374.0, ans=0.0 2024-09-26 01:04:52,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=871374.0, ans=0.0 2024-09-26 01:05:03,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=871420.6666666666, ans=0.025 2024-09-26 01:05:48,462 INFO [train.py:1198] (3/4) Epoch 48, batch 3650, loss[loss=0.2438, ctc_loss=0.1646, cr_loss=0.3961, over 15249.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1202, cr_loss=0.3379, over 3337811.44 frames. ], batch size: 89, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 01:06:01,705 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 01:06:12,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=871607.3333333334, ans=0.2 2024-09-26 01:06:14,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=871607.3333333334, ans=0.09899494936611666 2024-09-26 01:06:18,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=871607.3333333334, ans=0.125 2024-09-26 01:06:21,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871654.0, ans=0.1 2024-09-26 01:06:28,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=871654.0, ans=0.125 2024-09-26 01:06:32,026 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-09-26 01:06:32,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=871654.0, ans=0.05 2024-09-26 01:06:40,148 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.328e+02 1.397e+02 1.530e+02 1.851e+02, threshold=2.795e+02, percent-clipped=0.0 2024-09-26 01:06:43,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=871700.6666666666, ans=0.05 2024-09-26 01:06:47,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=871700.6666666666, ans=0.0 2024-09-26 01:07:09,264 INFO [train.py:1198] (3/4) Epoch 48, batch 3700, loss[loss=0.2013, ctc_loss=0.1299, cr_loss=0.3571, over 17351.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1201, cr_loss=0.3383, over 3336534.18 frames. ], batch size: 48, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 01:07:26,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=871840.6666666666, ans=0.125 2024-09-26 01:07:30,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=871840.6666666666, ans=0.025 2024-09-26 01:07:42,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=871887.3333333334, ans=0.125 2024-09-26 01:08:00,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=871934.0, ans=0.125 2024-09-26 01:08:19,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2024-09-26 01:08:20,800 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 01:08:22,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=871980.6666666666, ans=0.125 2024-09-26 01:08:28,631 INFO [train.py:1198] (3/4) Epoch 48, batch 3750, loss[loss=0.197, ctc_loss=0.126, cr_loss=0.3552, over 17284.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1198, cr_loss=0.3381, over 3343723.30 frames. ], batch size: 51, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 01:09:04,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=872120.6666666666, ans=0.1 2024-09-26 01:09:20,720 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.319e+02 1.400e+02 1.543e+02 5.735e+02, threshold=2.801e+02, percent-clipped=1.0 2024-09-26 01:09:31,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=872214.0, ans=0.0 2024-09-26 01:09:32,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=872214.0, ans=0.0 2024-09-26 01:09:40,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.49 vs. limit=10.0 2024-09-26 01:09:48,872 INFO [train.py:1198] (3/4) Epoch 48, batch 3800, loss[loss=0.2182, ctc_loss=0.1408, cr_loss=0.3874, over 17009.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1198, cr_loss=0.3381, over 3326827.56 frames. ], batch size: 53, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 01:10:08,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2024-09-26 01:10:45,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=872400.6666666666, ans=0.0 2024-09-26 01:10:56,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=872447.3333333334, ans=0.125 2024-09-26 01:11:06,799 INFO [train.py:1198] (3/4) Epoch 48, batch 3850, loss[loss=0.1747, ctc_loss=0.1096, cr_loss=0.3255, over 17148.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.121, cr_loss=0.3397, over 3301453.94 frames. ], batch size: 45, lr: 2.46e-03, grad_scale: 8.0 2024-09-26 01:11:08,701 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 01:11:13,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=872494.0, ans=0.125 2024-09-26 01:11:29,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-09-26 01:11:38,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=872587.3333333334, ans=0.125 2024-09-26 01:11:59,145 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.219e+02 1.389e+02 1.529e+02 1.705e+02 2.274e+02, threshold=3.058e+02, percent-clipped=0.0 2024-09-26 01:12:10,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-09-26 01:13:04,119 INFO [train.py:1198] (3/4) Epoch 49, batch 0, loss[loss=0.1683, ctc_loss=0.1038, cr_loss=0.3225, over 17203.00 frames. ], tot_loss[loss=0.1683, ctc_loss=0.1038, cr_loss=0.3225, over 17203.00 frames. ], batch size: 47, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:13:04,120 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-26 01:13:19,510 INFO [train.py:1230] (3/4) Epoch 49, validation: loss=0.03487, ctc_loss=0.03487, cr_loss=1.087e-14, over 944034.00 frames. 2024-09-26 01:13:19,510 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-26 01:13:39,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=872755.3333333334, ans=0.125 2024-09-26 01:14:03,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=872802.0, ans=0.2 2024-09-26 01:14:44,164 INFO [train.py:1198] (3/4) Epoch 49, batch 50, loss[loss=0.2248, ctc_loss=0.1471, cr_loss=0.3886, over 15071.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1204, cr_loss=0.3402, over 754853.64 frames. ], batch size: 89, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:14:55,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=872942.0, ans=0.125 2024-09-26 01:14:56,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=872942.0, ans=0.125 2024-09-26 01:15:39,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=873082.0, ans=0.0 2024-09-26 01:15:39,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=873082.0, ans=0.0 2024-09-26 01:15:47,636 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.321e+02 1.394e+02 1.552e+02 2.223e+02, threshold=2.788e+02, percent-clipped=0.0 2024-09-26 01:15:51,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=873128.6666666666, ans=0.09899494936611666 2024-09-26 01:16:07,027 INFO [train.py:1198] (3/4) Epoch 49, batch 100, loss[loss=0.1432, ctc_loss=0.0898, cr_loss=0.2668, over 17168.00 frames. ], tot_loss[loss=0.1847, ctc_loss=0.1179, cr_loss=0.334, over 1326563.08 frames. ], batch size: 41, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:16:40,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=873268.6666666666, ans=0.2 2024-09-26 01:16:45,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=873268.6666666666, ans=0.0 2024-09-26 01:16:58,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=873315.3333333334, ans=0.0 2024-09-26 01:16:58,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-09-26 01:17:29,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.63 vs. limit=5.0 2024-09-26 01:17:29,483 INFO [train.py:1198] (3/4) Epoch 49, batch 150, loss[loss=0.2106, ctc_loss=0.1361, cr_loss=0.3723, over 15939.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1183, cr_loss=0.3354, over 1766931.02 frames. ], batch size: 74, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:17:51,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=873455.3333333334, ans=0.035 2024-09-26 01:18:32,853 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.298e+02 1.390e+02 1.504e+02 2.457e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-26 01:18:51,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=873642.0, ans=0.2 2024-09-26 01:18:52,323 INFO [train.py:1198] (3/4) Epoch 49, batch 200, loss[loss=0.1559, ctc_loss=0.09785, cr_loss=0.2901, over 17280.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1189, cr_loss=0.3368, over 2124051.92 frames. ], batch size: 42, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:19:32,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=873735.3333333334, ans=0.125 2024-09-26 01:20:02,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=873828.6666666666, ans=0.125 2024-09-26 01:20:17,865 INFO [train.py:1198] (3/4) Epoch 49, batch 250, loss[loss=0.1548, ctc_loss=0.09536, cr_loss=0.2971, over 17101.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1185, cr_loss=0.3361, over 2404121.89 frames. ], batch size: 40, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:20:32,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=873922.0, ans=0.125 2024-09-26 01:21:09,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=874015.3333333334, ans=0.0 2024-09-26 01:21:18,234 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.257e+02 1.339e+02 1.417e+02 1.603e+02, threshold=2.678e+02, percent-clipped=0.0 2024-09-26 01:21:34,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=874062.0, ans=0.0 2024-09-26 01:21:37,798 INFO [train.py:1198] (3/4) Epoch 49, batch 300, loss[loss=0.1615, ctc_loss=0.1011, cr_loss=0.3025, over 17323.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1191, cr_loss=0.3373, over 2619521.26 frames. ], batch size: 51, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:22:15,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=874202.0, ans=0.0 2024-09-26 01:22:56,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=874295.3333333334, ans=0.0 2024-09-26 01:23:00,852 INFO [train.py:1198] (3/4) Epoch 49, batch 350, loss[loss=0.2077, ctc_loss=0.1345, cr_loss=0.3657, over 17028.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1179, cr_loss=0.3354, over 2790760.38 frames. ], batch size: 52, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:23:13,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=874342.0, ans=0.09899494936611666 2024-09-26 01:23:44,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2024-09-26 01:23:45,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=874435.3333333334, ans=0.125 2024-09-26 01:23:53,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=874482.0, ans=0.125 2024-09-26 01:24:04,153 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.315e+02 1.403e+02 1.517e+02 7.992e+02, threshold=2.807e+02, percent-clipped=1.0 2024-09-26 01:24:23,276 INFO [train.py:1198] (3/4) Epoch 49, batch 400, loss[loss=0.1855, ctc_loss=0.1187, cr_loss=0.3337, over 17293.00 frames. ], tot_loss[loss=0.1841, ctc_loss=0.1174, cr_loss=0.3337, over 2905988.16 frames. ], batch size: 46, lr: 2.43e-03, grad_scale: 32.0 2024-09-26 01:25:07,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=874668.6666666666, ans=0.125 2024-09-26 01:25:26,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=874715.3333333334, ans=0.1 2024-09-26 01:25:32,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=874762.0, ans=0.125 2024-09-26 01:25:48,235 INFO [train.py:1198] (3/4) Epoch 49, batch 450, loss[loss=0.1865, ctc_loss=0.1195, cr_loss=0.335, over 17095.00 frames. ], tot_loss[loss=0.1851, ctc_loss=0.1181, cr_loss=0.3348, over 3012138.09 frames. ], batch size: 49, lr: 2.43e-03, grad_scale: 32.0 2024-09-26 01:25:53,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=874808.6666666666, ans=0.1 2024-09-26 01:26:25,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=874902.0, ans=0.0 2024-09-26 01:26:33,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=874902.0, ans=0.0 2024-09-26 01:26:37,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=22.5 2024-09-26 01:26:49,065 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.319e+02 1.417e+02 1.516e+02 2.635e+02, threshold=2.833e+02, percent-clipped=0.0 2024-09-26 01:27:08,163 INFO [train.py:1198] (3/4) Epoch 49, batch 500, loss[loss=0.2066, ctc_loss=0.1376, cr_loss=0.3453, over 16573.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1195, cr_loss=0.3366, over 3074642.59 frames. ], batch size: 66, lr: 2.43e-03, grad_scale: 32.0 2024-09-26 01:27:08,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=875042.0, ans=0.0 2024-09-26 01:27:25,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=875088.6666666666, ans=0.0 2024-09-26 01:27:39,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=875088.6666666666, ans=0.025 2024-09-26 01:27:52,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=875135.3333333334, ans=0.125 2024-09-26 01:27:58,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=875182.0, ans=0.0 2024-09-26 01:28:32,677 INFO [train.py:1198] (3/4) Epoch 49, batch 550, loss[loss=0.1591, ctc_loss=0.1019, cr_loss=0.2861, over 17263.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1191, cr_loss=0.336, over 3138559.13 frames. ], batch size: 44, lr: 2.43e-03, grad_scale: 32.0 2024-09-26 01:29:00,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=875322.0, ans=0.125 2024-09-26 01:29:33,610 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.295e+02 1.361e+02 1.476e+02 2.484e+02, threshold=2.722e+02, percent-clipped=0.0 2024-09-26 01:29:40,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=875462.0, ans=0.125 2024-09-26 01:29:40,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=875462.0, ans=0.2 2024-09-26 01:29:52,183 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.11 vs. limit=15.0 2024-09-26 01:29:58,455 INFO [train.py:1198] (3/4) Epoch 49, batch 600, loss[loss=0.1677, ctc_loss=0.107, cr_loss=0.304, over 17270.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1187, cr_loss=0.3346, over 3184846.81 frames. ], batch size: 42, lr: 2.43e-03, grad_scale: 32.0 2024-09-26 01:30:12,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2024-09-26 01:30:17,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=875555.3333333334, ans=0.125 2024-09-26 01:30:24,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=875555.3333333334, ans=0.125 2024-09-26 01:30:37,293 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.03 vs. limit=15.0 2024-09-26 01:30:57,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=875648.6666666666, ans=0.2 2024-09-26 01:31:18,111 INFO [train.py:1198] (3/4) Epoch 49, batch 650, loss[loss=0.2221, ctc_loss=0.1393, cr_loss=0.4137, over 17025.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1185, cr_loss=0.3353, over 3232651.48 frames. ], batch size: 53, lr: 2.43e-03, grad_scale: 32.0 2024-09-26 01:32:04,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=875882.0, ans=0.1 2024-09-26 01:32:21,520 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.285e+02 1.382e+02 1.503e+02 2.355e+02, threshold=2.765e+02, percent-clipped=0.0 2024-09-26 01:32:29,031 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.71 vs. limit=10.0 2024-09-26 01:32:40,442 INFO [train.py:1198] (3/4) Epoch 49, batch 700, loss[loss=0.1832, ctc_loss=0.1157, cr_loss=0.3374, over 17027.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1183, cr_loss=0.3348, over 3259090.30 frames. ], batch size: 51, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:33:21,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=876068.6666666666, ans=0.0 2024-09-26 01:33:34,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=876115.3333333334, ans=0.0 2024-09-26 01:33:36,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=876115.3333333334, ans=0.0 2024-09-26 01:33:54,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=876162.0, ans=0.0 2024-09-26 01:34:03,503 INFO [train.py:1198] (3/4) Epoch 49, batch 750, loss[loss=0.2171, ctc_loss=0.1425, cr_loss=0.3729, over 16902.00 frames. ], tot_loss[loss=0.1835, ctc_loss=0.117, cr_loss=0.3325, over 3286943.25 frames. ], batch size: 58, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:34:09,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=12.0 2024-09-26 01:34:27,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=876255.3333333334, ans=0.125 2024-09-26 01:34:34,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=876302.0, ans=0.125 2024-09-26 01:34:49,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=876302.0, ans=0.125 2024-09-26 01:34:49,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=876302.0, ans=0.125 2024-09-26 01:35:09,767 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.321e+02 1.400e+02 1.492e+02 1.805e+02, threshold=2.801e+02, percent-clipped=0.0 2024-09-26 01:35:16,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=876395.3333333334, ans=0.125 2024-09-26 01:35:28,996 INFO [train.py:1198] (3/4) Epoch 49, batch 800, loss[loss=0.1861, ctc_loss=0.118, cr_loss=0.3407, over 17168.00 frames. ], tot_loss[loss=0.1833, ctc_loss=0.1169, cr_loss=0.3324, over 3304565.85 frames. ], batch size: 45, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:36:01,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=876535.3333333334, ans=0.125 2024-09-26 01:36:14,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=876535.3333333334, ans=0.2 2024-09-26 01:36:20,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=876582.0, ans=0.125 2024-09-26 01:36:49,026 INFO [train.py:1198] (3/4) Epoch 49, batch 850, loss[loss=0.18, ctc_loss=0.1154, cr_loss=0.3228, over 17011.00 frames. ], tot_loss[loss=0.183, ctc_loss=0.1166, cr_loss=0.3319, over 3326498.58 frames. ], batch size: 51, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:37:02,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876675.3333333334, ans=0.1 2024-09-26 01:37:49,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=876815.3333333334, ans=0.125 2024-09-26 01:37:51,966 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.174e+02 1.324e+02 1.412e+02 1.511e+02 2.737e+02, threshold=2.823e+02, percent-clipped=0.0 2024-09-26 01:38:03,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=876862.0, ans=0.125 2024-09-26 01:38:08,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=876862.0, ans=0.025 2024-09-26 01:38:11,393 INFO [train.py:1198] (3/4) Epoch 49, batch 900, loss[loss=0.1783, ctc_loss=0.1116, cr_loss=0.3333, over 17019.00 frames. ], tot_loss[loss=0.184, ctc_loss=0.1173, cr_loss=0.3334, over 3337411.50 frames. ], batch size: 39, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:38:24,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2024-09-26 01:38:51,382 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.76 vs. limit=12.0 2024-09-26 01:38:51,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.44 vs. limit=12.0 2024-09-26 01:38:58,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=877002.0, ans=0.125 2024-09-26 01:39:19,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=877095.3333333334, ans=0.0 2024-09-26 01:39:24,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=877095.3333333334, ans=0.0 2024-09-26 01:39:29,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=877095.3333333334, ans=0.0 2024-09-26 01:39:33,925 INFO [train.py:1198] (3/4) Epoch 49, batch 950, loss[loss=0.1849, ctc_loss=0.1179, cr_loss=0.3346, over 17300.00 frames. ], tot_loss[loss=0.1838, ctc_loss=0.1172, cr_loss=0.3331, over 3346470.91 frames. ], batch size: 49, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:40:13,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=877235.3333333334, ans=0.0 2024-09-26 01:40:41,474 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.300e+02 1.395e+02 1.483e+02 3.362e+02, threshold=2.790e+02, percent-clipped=1.0 2024-09-26 01:40:52,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=877328.6666666666, ans=0.125 2024-09-26 01:41:00,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=877375.3333333334, ans=0.0 2024-09-26 01:41:01,592 INFO [train.py:1198] (3/4) Epoch 49, batch 1000, loss[loss=0.2059, ctc_loss=0.1326, cr_loss=0.3664, over 16204.00 frames. ], tot_loss[loss=0.1846, ctc_loss=0.1178, cr_loss=0.3339, over 3339273.93 frames. ], batch size: 74, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:41:14,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=877375.3333333334, ans=0.125 2024-09-26 01:41:21,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-26 01:41:43,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877468.6666666666, ans=0.1 2024-09-26 01:41:59,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877515.3333333334, ans=0.1 2024-09-26 01:42:20,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=877562.0, ans=0.0 2024-09-26 01:42:23,946 INFO [train.py:1198] (3/4) Epoch 49, batch 1050, loss[loss=0.1495, ctc_loss=0.09347, cr_loss=0.2802, over 17018.00 frames. ], tot_loss[loss=0.1846, ctc_loss=0.1178, cr_loss=0.3336, over 3345067.82 frames. ], batch size: 39, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:42:41,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=877655.3333333334, ans=0.09899494936611666 2024-09-26 01:43:28,469 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.276e+02 1.380e+02 1.490e+02 2.320e+02, threshold=2.759e+02, percent-clipped=0.0 2024-09-26 01:43:46,130 INFO [train.py:1198] (3/4) Epoch 49, batch 1100, loss[loss=0.198, ctc_loss=0.1252, cr_loss=0.3641, over 17211.00 frames. ], tot_loss[loss=0.1841, ctc_loss=0.1176, cr_loss=0.3326, over 3356126.01 frames. ], batch size: 47, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:43:46,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=877842.0, ans=0.2 2024-09-26 01:43:54,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=877842.0, ans=0.0 2024-09-26 01:44:13,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=877888.6666666666, ans=0.125 2024-09-26 01:44:32,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=877982.0, ans=0.0 2024-09-26 01:44:37,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=877982.0, ans=10.0 2024-09-26 01:44:53,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=878028.6666666666, ans=0.125 2024-09-26 01:45:04,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=878028.6666666666, ans=0.2 2024-09-26 01:45:10,652 INFO [train.py:1198] (3/4) Epoch 49, batch 1150, loss[loss=0.1734, ctc_loss=0.1115, cr_loss=0.3091, over 17029.00 frames. ], tot_loss[loss=0.1843, ctc_loss=0.1178, cr_loss=0.3326, over 3350623.36 frames. ], batch size: 51, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:45:15,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=878075.3333333334, ans=0.1 2024-09-26 01:45:19,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=878075.3333333334, ans=0.1 2024-09-26 01:45:39,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=878122.0, ans=0.125 2024-09-26 01:45:52,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=878168.6666666666, ans=0.0 2024-09-26 01:46:03,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=878215.3333333334, ans=0.125 2024-09-26 01:46:13,008 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.291e+02 1.352e+02 1.450e+02 2.079e+02, threshold=2.704e+02, percent-clipped=0.0 2024-09-26 01:46:18,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=878262.0, ans=0.0 2024-09-26 01:46:24,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=878262.0, ans=0.125 2024-09-26 01:46:30,813 INFO [train.py:1198] (3/4) Epoch 49, batch 1200, loss[loss=0.1557, ctc_loss=0.09524, cr_loss=0.3023, over 17044.00 frames. ], tot_loss[loss=0.1837, ctc_loss=0.1173, cr_loss=0.332, over 3361408.36 frames. ], batch size: 39, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:46:31,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=878308.6666666666, ans=0.025 2024-09-26 01:46:38,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=878308.6666666666, ans=0.025 2024-09-26 01:46:45,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=878355.3333333334, ans=0.125 2024-09-26 01:46:46,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=878355.3333333334, ans=0.125 2024-09-26 01:47:20,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=8.0 2024-09-26 01:47:24,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2024-09-26 01:47:33,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=878448.6666666666, ans=0.0 2024-09-26 01:47:33,868 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 01:47:38,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=878495.3333333334, ans=0.07 2024-09-26 01:47:52,404 INFO [train.py:1198] (3/4) Epoch 49, batch 1250, loss[loss=0.1925, ctc_loss=0.124, cr_loss=0.3425, over 17071.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.118, cr_loss=0.3341, over 3361385.83 frames. ], batch size: 46, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:48:05,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=878542.0, ans=0.125 2024-09-26 01:48:26,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=878635.3333333334, ans=0.1 2024-09-26 01:48:36,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=878635.3333333334, ans=0.02 2024-09-26 01:48:44,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=878682.0, ans=0.0 2024-09-26 01:48:56,556 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.287e+02 1.357e+02 1.444e+02 1.832e+02, threshold=2.714e+02, percent-clipped=0.0 2024-09-26 01:48:56,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=878728.6666666666, ans=0.125 2024-09-26 01:49:09,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=878728.6666666666, ans=0.125 2024-09-26 01:49:14,201 INFO [train.py:1198] (3/4) Epoch 49, batch 1300, loss[loss=0.2084, ctc_loss=0.1348, cr_loss=0.3678, over 17146.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.118, cr_loss=0.334, over 3354251.79 frames. ], batch size: 48, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:49:36,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=878822.0, ans=0.2 2024-09-26 01:49:38,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=878822.0, ans=0.1 2024-09-26 01:49:59,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=878868.6666666666, ans=0.1 2024-09-26 01:50:34,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2024-09-26 01:50:39,184 INFO [train.py:1198] (3/4) Epoch 49, batch 1350, loss[loss=0.2297, ctc_loss=0.1523, cr_loss=0.3871, over 16140.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1182, cr_loss=0.3339, over 3366318.22 frames. ], batch size: 74, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:50:44,943 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2024-09-26 01:50:52,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=879008.6666666666, ans=0.125 2024-09-26 01:51:10,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.19 vs. limit=10.0 2024-09-26 01:51:11,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=879102.0, ans=0.0 2024-09-26 01:51:13,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=879102.0, ans=0.125 2024-09-26 01:51:14,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=879102.0, ans=0.0 2024-09-26 01:51:24,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=879102.0, ans=0.035 2024-09-26 01:51:29,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=879148.6666666666, ans=0.0 2024-09-26 01:51:40,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=879148.6666666666, ans=0.0 2024-09-26 01:51:41,923 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.182e+02 1.323e+02 1.398e+02 1.529e+02 2.878e+02, threshold=2.797e+02, percent-clipped=1.0 2024-09-26 01:51:59,738 INFO [train.py:1198] (3/4) Epoch 49, batch 1400, loss[loss=0.1741, ctc_loss=0.1112, cr_loss=0.3148, over 17096.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1186, cr_loss=0.3347, over 3355536.72 frames. ], batch size: 43, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:52:22,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2024-09-26 01:52:31,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=879288.6666666666, ans=0.125 2024-09-26 01:52:37,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=879335.3333333334, ans=0.125 2024-09-26 01:53:18,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=879428.6666666666, ans=0.09899494936611666 2024-09-26 01:53:24,002 INFO [train.py:1198] (3/4) Epoch 49, batch 1450, loss[loss=0.2026, ctc_loss=0.1322, cr_loss=0.3518, over 15830.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1188, cr_loss=0.3345, over 3364101.48 frames. ], batch size: 74, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:53:30,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=879475.3333333334, ans=0.1 2024-09-26 01:53:33,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=879475.3333333334, ans=0.07 2024-09-26 01:54:28,641 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.317e+02 1.389e+02 1.499e+02 2.448e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-26 01:54:37,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=879662.0, ans=0.1 2024-09-26 01:54:47,245 INFO [train.py:1198] (3/4) Epoch 49, batch 1500, loss[loss=0.2016, ctc_loss=0.1326, cr_loss=0.3451, over 16995.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1185, cr_loss=0.3342, over 3369891.37 frames. ], batch size: 53, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:55:43,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=879848.6666666666, ans=0.125 2024-09-26 01:55:46,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=879848.6666666666, ans=0.125 2024-09-26 01:55:50,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=879895.3333333334, ans=0.125 2024-09-26 01:56:07,518 INFO [train.py:1198] (3/4) Epoch 49, batch 1550, loss[loss=0.1926, ctc_loss=0.1223, cr_loss=0.3517, over 15840.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1191, cr_loss=0.3358, over 3369745.50 frames. ], batch size: 74, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:56:37,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=15.0 2024-09-26 01:57:13,894 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.314e+02 1.386e+02 1.460e+02 1.935e+02, threshold=2.771e+02, percent-clipped=0.0 2024-09-26 01:57:18,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2024-09-26 01:57:30,179 INFO [train.py:1198] (3/4) Epoch 49, batch 1600, loss[loss=0.1826, ctc_loss=0.1195, cr_loss=0.3153, over 17220.00 frames. ], tot_loss[loss=0.1864, ctc_loss=0.1192, cr_loss=0.3361, over 3373634.87 frames. ], batch size: 50, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:57:33,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=880175.3333333334, ans=0.0 2024-09-26 01:57:44,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=880222.0, ans=0.035 2024-09-26 01:58:00,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=880268.6666666666, ans=0.1 2024-09-26 01:58:21,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2024-09-26 01:58:25,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=880315.3333333334, ans=0.0 2024-09-26 01:58:52,673 INFO [train.py:1198] (3/4) Epoch 49, batch 1650, loss[loss=0.1919, ctc_loss=0.1217, cr_loss=0.3508, over 17008.00 frames. ], tot_loss[loss=0.1864, ctc_loss=0.1191, cr_loss=0.3367, over 3370023.60 frames. ], batch size: 51, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:59:12,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=880455.3333333334, ans=0.125 2024-09-26 01:59:18,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=880455.3333333334, ans=0.07 2024-09-26 01:59:30,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2024-09-26 01:59:47,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=880548.6666666666, ans=0.125 2024-09-26 02:00:01,887 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.295e+02 1.382e+02 1.560e+02 2.405e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-26 02:00:03,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=880595.3333333334, ans=0.125 2024-09-26 02:00:17,671 INFO [train.py:1198] (3/4) Epoch 49, batch 1700, loss[loss=0.2033, ctc_loss=0.1288, cr_loss=0.3723, over 17301.00 frames. ], tot_loss[loss=0.1864, ctc_loss=0.119, cr_loss=0.3372, over 3372630.48 frames. ], batch size: 49, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 02:00:24,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=880642.0, ans=0.0 2024-09-26 02:01:01,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=880735.3333333334, ans=0.125 2024-09-26 02:01:20,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=880828.6666666666, ans=0.0 2024-09-26 02:01:29,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=880828.6666666666, ans=0.125 2024-09-26 02:01:37,519 INFO [train.py:1198] (3/4) Epoch 49, batch 1750, loss[loss=0.18, ctc_loss=0.1125, cr_loss=0.3372, over 17251.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1184, cr_loss=0.3362, over 3369416.95 frames. ], batch size: 44, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:01:44,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2024-09-26 02:01:49,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.94 vs. limit=6.0 2024-09-26 02:01:57,763 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=22.5 2024-09-26 02:02:24,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=880968.6666666666, ans=0.0 2024-09-26 02:02:28,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=881015.3333333334, ans=0.05 2024-09-26 02:02:32,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=881015.3333333334, ans=0.0 2024-09-26 02:02:36,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=881015.3333333334, ans=0.0 2024-09-26 02:02:45,570 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.296e+02 1.405e+02 1.536e+02 1.900e+02, threshold=2.809e+02, percent-clipped=0.0 2024-09-26 02:02:59,570 INFO [train.py:1198] (3/4) Epoch 49, batch 1800, loss[loss=0.164, ctc_loss=0.1034, cr_loss=0.3034, over 16292.00 frames. ], tot_loss[loss=0.1861, ctc_loss=0.1187, cr_loss=0.3371, over 3378751.26 frames. ], batch size: 36, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:03:17,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=881155.3333333334, ans=0.2 2024-09-26 02:03:21,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=881155.3333333334, ans=0.125 2024-09-26 02:03:38,384 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2024-09-26 02:04:06,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=881295.3333333334, ans=0.1 2024-09-26 02:04:09,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=881295.3333333334, ans=0.125 2024-09-26 02:04:21,863 INFO [train.py:1198] (3/4) Epoch 49, batch 1850, loss[loss=0.174, ctc_loss=0.1114, cr_loss=0.3132, over 16985.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1181, cr_loss=0.3361, over 3381296.18 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:04:32,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2024-09-26 02:05:04,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=881435.3333333334, ans=0.2 2024-09-26 02:05:04,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=881435.3333333334, ans=0.125 2024-09-26 02:05:12,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.40 vs. limit=15.0 2024-09-26 02:05:21,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.89 vs. limit=15.0 2024-09-26 02:05:23,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=881482.0, ans=0.125 2024-09-26 02:05:23,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=881482.0, ans=0.125 2024-09-26 02:05:33,168 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.195e+02 1.348e+02 1.416e+02 1.489e+02 2.449e+02, threshold=2.831e+02, percent-clipped=0.0 2024-09-26 02:05:41,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=881528.6666666666, ans=0.1 2024-09-26 02:05:47,596 INFO [train.py:1198] (3/4) Epoch 49, batch 1900, loss[loss=0.2488, ctc_loss=0.1665, cr_loss=0.4114, over 11867.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1181, cr_loss=0.3353, over 3368393.73 frames. ], batch size: 123, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:06:00,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=881575.3333333334, ans=0.125 2024-09-26 02:06:45,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=881715.3333333334, ans=0.04949747468305833 2024-09-26 02:07:10,387 INFO [train.py:1198] (3/4) Epoch 49, batch 1950, loss[loss=0.1883, ctc_loss=0.1196, cr_loss=0.3438, over 17021.00 frames. ], tot_loss[loss=0.1846, ctc_loss=0.1177, cr_loss=0.3344, over 3374200.23 frames. ], batch size: 51, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:07:10,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=881808.6666666666, ans=0.125 2024-09-26 02:07:51,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=881902.0, ans=0.0 2024-09-26 02:08:12,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=881995.3333333334, ans=0.1 2024-09-26 02:08:17,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2024-09-26 02:08:18,104 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.175e+02 1.309e+02 1.411e+02 1.494e+02 2.442e+02, threshold=2.822e+02, percent-clipped=0.0 2024-09-26 02:08:18,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=881995.3333333334, ans=0.0 2024-09-26 02:08:29,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=881995.3333333334, ans=0.1 2024-09-26 02:08:32,519 INFO [train.py:1198] (3/4) Epoch 49, batch 2000, loss[loss=0.1816, ctc_loss=0.1128, cr_loss=0.3438, over 16346.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1181, cr_loss=0.3358, over 3378821.53 frames. ], batch size: 36, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 02:08:56,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=22.5 2024-09-26 02:09:17,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=882135.3333333334, ans=0.125 2024-09-26 02:09:24,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.77 vs. limit=10.0 2024-09-26 02:09:57,473 INFO [train.py:1198] (3/4) Epoch 49, batch 2050, loss[loss=0.2152, ctc_loss=0.1382, cr_loss=0.385, over 17012.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.1178, cr_loss=0.3354, over 3381285.61 frames. ], batch size: 53, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 02:10:04,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=882275.3333333334, ans=0.125 2024-09-26 02:11:04,656 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.334e+02 1.409e+02 1.527e+02 2.352e+02, threshold=2.819e+02, percent-clipped=0.0 2024-09-26 02:11:17,570 INFO [train.py:1198] (3/4) Epoch 49, batch 2100, loss[loss=0.206, ctc_loss=0.1313, cr_loss=0.3736, over 17169.00 frames. ], tot_loss[loss=0.1851, ctc_loss=0.118, cr_loss=0.3357, over 3380984.44 frames. ], batch size: 45, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:11:19,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=882508.6666666666, ans=0.1 2024-09-26 02:11:35,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=882555.3333333334, ans=0.125 2024-09-26 02:11:53,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=882602.0, ans=0.125 2024-09-26 02:12:11,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=882648.6666666666, ans=0.2 2024-09-26 02:12:11,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=882648.6666666666, ans=0.0 2024-09-26 02:12:39,853 INFO [train.py:1198] (3/4) Epoch 49, batch 2150, loss[loss=0.2237, ctc_loss=0.1487, cr_loss=0.3748, over 12053.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1181, cr_loss=0.3357, over 3362407.29 frames. ], batch size: 124, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:12:48,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=882742.0, ans=0.125 2024-09-26 02:13:25,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=882835.3333333334, ans=0.2 2024-09-26 02:13:49,690 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.355e+02 1.416e+02 1.495e+02 3.199e+02, threshold=2.832e+02, percent-clipped=1.0 2024-09-26 02:13:50,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=22.5 2024-09-26 02:14:02,464 INFO [train.py:1198] (3/4) Epoch 49, batch 2200, loss[loss=0.1797, ctc_loss=0.1142, cr_loss=0.328, over 17018.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1186, cr_loss=0.3367, over 3355720.62 frames. ], batch size: 51, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:14:12,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=882975.3333333334, ans=0.125 2024-09-26 02:14:17,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=883022.0, ans=0.0 2024-09-26 02:14:47,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=883068.6666666666, ans=0.2 2024-09-26 02:15:10,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=883162.0, ans=0.125 2024-09-26 02:15:27,835 INFO [train.py:1198] (3/4) Epoch 49, batch 2250, loss[loss=0.1831, ctc_loss=0.1153, cr_loss=0.3389, over 16984.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.119, cr_loss=0.3381, over 3342388.64 frames. ], batch size: 42, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:15:31,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=883208.6666666666, ans=0.0 2024-09-26 02:15:48,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=883255.3333333334, ans=0.125 2024-09-26 02:15:55,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=883255.3333333334, ans=0.125 2024-09-26 02:16:34,924 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.298e+02 1.376e+02 1.462e+02 1.948e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-26 02:16:47,760 INFO [train.py:1198] (3/4) Epoch 49, batch 2300, loss[loss=0.2082, ctc_loss=0.1338, cr_loss=0.3723, over 17304.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1186, cr_loss=0.337, over 3341876.38 frames. ], batch size: 49, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:17:00,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=883442.0, ans=0.125 2024-09-26 02:17:02,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=883442.0, ans=0.025 2024-09-26 02:17:13,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=883488.6666666666, ans=0.125 2024-09-26 02:17:18,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-09-26 02:17:40,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=883582.0, ans=0.2 2024-09-26 02:17:48,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=883582.0, ans=0.125 2024-09-26 02:18:07,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2024-09-26 02:18:10,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=883628.6666666666, ans=0.1 2024-09-26 02:18:13,002 INFO [train.py:1198] (3/4) Epoch 49, batch 2350, loss[loss=0.1444, ctc_loss=0.09118, cr_loss=0.2661, over 17100.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1185, cr_loss=0.3365, over 3340601.55 frames. ], batch size: 43, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:18:34,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=883722.0, ans=0.125 2024-09-26 02:18:34,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=883722.0, ans=0.125 2024-09-26 02:18:34,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2024-09-26 02:18:39,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=883722.0, ans=0.125 2024-09-26 02:18:41,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2024-09-26 02:19:20,302 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.282e+02 1.377e+02 1.471e+02 1.818e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-26 02:19:35,629 INFO [train.py:1198] (3/4) Epoch 49, batch 2400, loss[loss=0.1767, ctc_loss=0.1144, cr_loss=0.3115, over 17233.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1181, cr_loss=0.3364, over 3353613.48 frames. ], batch size: 47, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:19:52,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=883955.3333333334, ans=0.0 2024-09-26 02:20:07,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=883955.3333333334, ans=0.1 2024-09-26 02:20:10,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=884002.0, ans=0.125 2024-09-26 02:20:56,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=884142.0, ans=0.125 2024-09-26 02:20:58,124 INFO [train.py:1198] (3/4) Epoch 49, batch 2450, loss[loss=0.1898, ctc_loss=0.1205, cr_loss=0.3469, over 17219.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1182, cr_loss=0.3369, over 3365232.73 frames. ], batch size: 50, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:21:21,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=884188.6666666666, ans=0.125 2024-09-26 02:21:21,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.14 vs. limit=15.0 2024-09-26 02:21:30,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=884235.3333333334, ans=0.125 2024-09-26 02:21:32,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=22.5 2024-09-26 02:22:09,463 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.309e+02 1.374e+02 1.463e+02 2.759e+02, threshold=2.748e+02, percent-clipped=1.0 2024-09-26 02:22:20,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2024-09-26 02:22:20,971 INFO [train.py:1198] (3/4) Epoch 49, batch 2500, loss[loss=0.2075, ctc_loss=0.1381, cr_loss=0.3471, over 16597.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1184, cr_loss=0.3366, over 3362763.19 frames. ], batch size: 66, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:22:50,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=884422.0, ans=0.125 2024-09-26 02:23:14,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=884515.3333333334, ans=0.125 2024-09-26 02:23:17,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=884515.3333333334, ans=0.0 2024-09-26 02:23:27,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2024-09-26 02:23:30,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=884562.0, ans=0.1 2024-09-26 02:23:44,124 INFO [train.py:1198] (3/4) Epoch 49, batch 2550, loss[loss=0.1829, ctc_loss=0.1165, cr_loss=0.3318, over 17073.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1182, cr_loss=0.336, over 3363387.57 frames. ], batch size: 46, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:23:50,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=884608.6666666666, ans=0.0 2024-09-26 02:24:06,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=884655.3333333334, ans=0.125 2024-09-26 02:24:19,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=884702.0, ans=0.125 2024-09-26 02:24:35,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=884748.6666666666, ans=0.0 2024-09-26 02:24:42,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=884748.6666666666, ans=0.2 2024-09-26 02:24:58,347 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.313e+02 1.367e+02 1.472e+02 2.257e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-26 02:25:09,550 INFO [train.py:1198] (3/4) Epoch 49, batch 2600, loss[loss=0.196, ctc_loss=0.127, cr_loss=0.3447, over 17363.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1185, cr_loss=0.3357, over 3364330.87 frames. ], batch size: 48, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:25:33,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=884888.6666666666, ans=0.125 2024-09-26 02:25:52,028 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.58 vs. limit=10.0 2024-09-26 02:26:27,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=12.0 2024-09-26 02:26:29,482 INFO [train.py:1198] (3/4) Epoch 49, batch 2650, loss[loss=0.219, ctc_loss=0.1448, cr_loss=0.3708, over 15169.00 frames. ], tot_loss[loss=0.1867, ctc_loss=0.1193, cr_loss=0.3368, over 3345526.61 frames. ], batch size: 89, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:26:34,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=885075.3333333334, ans=0.2 2024-09-26 02:26:41,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=885075.3333333334, ans=0.125 2024-09-26 02:26:45,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=885122.0, ans=0.0 2024-09-26 02:27:19,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.10 vs. limit=10.0 2024-09-26 02:27:21,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-09-26 02:27:25,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2024-09-26 02:27:38,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-09-26 02:27:40,642 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.183e+02 1.322e+02 1.427e+02 1.515e+02 2.814e+02, threshold=2.853e+02, percent-clipped=1.0 2024-09-26 02:27:51,959 INFO [train.py:1198] (3/4) Epoch 49, batch 2700, loss[loss=0.1752, ctc_loss=0.1137, cr_loss=0.3076, over 17305.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.119, cr_loss=0.3357, over 3350828.47 frames. ], batch size: 49, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:27:53,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=885308.6666666666, ans=0.125 2024-09-26 02:28:05,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=885308.6666666666, ans=0.125 2024-09-26 02:28:05,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=885308.6666666666, ans=0.025 2024-09-26 02:28:29,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=885402.0, ans=0.125 2024-09-26 02:29:08,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885495.3333333334, ans=0.1 2024-09-26 02:29:14,281 INFO [train.py:1198] (3/4) Epoch 49, batch 2750, loss[loss=0.229, ctc_loss=0.1546, cr_loss=0.3719, over 12192.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1185, cr_loss=0.3346, over 3352169.81 frames. ], batch size: 125, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:29:33,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2024-09-26 02:30:01,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=885635.3333333334, ans=0.125 2024-09-26 02:30:05,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=885682.0, ans=0.1 2024-09-26 02:30:27,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=885728.6666666666, ans=0.0 2024-09-26 02:30:28,234 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.217e+02 1.341e+02 1.441e+02 1.583e+02 2.930e+02, threshold=2.882e+02, percent-clipped=1.0 2024-09-26 02:30:39,199 INFO [train.py:1198] (3/4) Epoch 49, batch 2800, loss[loss=0.2284, ctc_loss=0.1515, cr_loss=0.3847, over 15114.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1186, cr_loss=0.3347, over 3353266.51 frames. ], batch size: 89, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:31:21,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=885868.6666666666, ans=0.0 2024-09-26 02:31:43,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=885962.0, ans=0.0 2024-09-26 02:32:01,791 INFO [train.py:1198] (3/4) Epoch 49, batch 2850, loss[loss=0.172, ctc_loss=0.1072, cr_loss=0.3244, over 17271.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1186, cr_loss=0.3346, over 3354665.93 frames. ], batch size: 44, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:32:23,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=886055.3333333334, ans=0.2 2024-09-26 02:32:37,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=886102.0, ans=0.2 2024-09-26 02:32:59,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=886148.6666666666, ans=0.125 2024-09-26 02:33:01,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=886148.6666666666, ans=0.125 2024-09-26 02:33:13,930 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.301e+02 1.414e+02 1.534e+02 2.528e+02, threshold=2.827e+02, percent-clipped=0.0 2024-09-26 02:33:15,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=886195.3333333334, ans=0.025 2024-09-26 02:33:25,146 INFO [train.py:1198] (3/4) Epoch 49, batch 2900, loss[loss=0.1619, ctc_loss=0.1013, cr_loss=0.303, over 17065.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1194, cr_loss=0.3358, over 3349901.26 frames. ], batch size: 43, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:33:42,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=886288.6666666666, ans=0.125 2024-09-26 02:33:45,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886288.6666666666, ans=0.1 2024-09-26 02:33:47,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.44 vs. limit=12.0 2024-09-26 02:34:49,795 INFO [train.py:1198] (3/4) Epoch 49, batch 2950, loss[loss=0.2426, ctc_loss=0.1602, cr_loss=0.412, over 12283.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1187, cr_loss=0.3349, over 3350532.59 frames. ], batch size: 123, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:35:25,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.30 vs. limit=15.0 2024-09-26 02:35:30,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=886568.6666666666, ans=0.125 2024-09-26 02:35:38,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=886615.3333333334, ans=0.0 2024-09-26 02:35:58,783 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.303e+02 1.409e+02 1.487e+02 2.405e+02, threshold=2.817e+02, percent-clipped=0.0 2024-09-26 02:36:09,979 INFO [train.py:1198] (3/4) Epoch 49, batch 3000, loss[loss=0.1978, ctc_loss=0.1266, cr_loss=0.3561, over 17223.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1189, cr_loss=0.3356, over 3352424.73 frames. ], batch size: 50, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:36:09,979 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-26 02:36:25,647 INFO [train.py:1230] (3/4) Epoch 49, validation: loss=0.03501, ctc_loss=0.03501, cr_loss=1.043e-14, over 944034.00 frames. 2024-09-26 02:36:25,648 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-26 02:36:26,291 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=12.0 2024-09-26 02:36:28,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=886708.6666666666, ans=0.125 2024-09-26 02:36:32,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=886708.6666666666, ans=0.0 2024-09-26 02:36:33,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886708.6666666666, ans=0.1 2024-09-26 02:36:42,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=886755.3333333334, ans=15.0 2024-09-26 02:36:43,318 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 02:36:55,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=12.0 2024-09-26 02:37:36,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=886895.3333333334, ans=0.125 2024-09-26 02:37:47,490 INFO [train.py:1198] (3/4) Epoch 49, batch 3050, loss[loss=0.1722, ctc_loss=0.1069, cr_loss=0.3266, over 15833.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1185, cr_loss=0.3348, over 3352870.65 frames. ], batch size: 35, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:37:55,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=886942.0, ans=0.125 2024-09-26 02:38:01,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=886988.6666666666, ans=0.2 2024-09-26 02:38:01,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=886988.6666666666, ans=0.125 2024-09-26 02:38:15,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=886988.6666666666, ans=0.2 2024-09-26 02:38:38,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2024-09-26 02:38:47,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=887082.0, ans=0.2 2024-09-26 02:38:47,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2024-09-26 02:38:54,748 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.193e+02 1.323e+02 1.385e+02 1.464e+02 2.781e+02, threshold=2.770e+02, percent-clipped=0.0 2024-09-26 02:39:05,814 INFO [train.py:1198] (3/4) Epoch 49, batch 3100, loss[loss=0.1569, ctc_loss=0.09827, cr_loss=0.293, over 17281.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1187, cr_loss=0.3355, over 3351557.24 frames. ], batch size: 46, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:39:06,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=887175.3333333334, ans=0.025 2024-09-26 02:39:21,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=887222.0, ans=0.0 2024-09-26 02:40:00,860 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 02:40:08,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=887315.3333333334, ans=0.0 2024-09-26 02:40:26,987 INFO [train.py:1198] (3/4) Epoch 49, batch 3150, loss[loss=0.1519, ctc_loss=0.09379, cr_loss=0.2906, over 16356.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1187, cr_loss=0.3352, over 3353789.05 frames. ], batch size: 36, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:40:49,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=887455.3333333334, ans=0.0 2024-09-26 02:41:19,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2024-09-26 02:41:20,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.20 vs. limit=15.0 2024-09-26 02:41:23,515 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 02:41:34,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=887595.3333333334, ans=0.0 2024-09-26 02:41:35,931 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.172e+02 1.299e+02 1.369e+02 1.479e+02 2.318e+02, threshold=2.737e+02, percent-clipped=0.0 2024-09-26 02:41:36,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2024-09-26 02:41:45,299 INFO [train.py:1198] (3/4) Epoch 49, batch 3200, loss[loss=0.1872, ctc_loss=0.1198, cr_loss=0.3371, over 17183.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1183, cr_loss=0.3346, over 3356580.50 frames. ], batch size: 45, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:42:38,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=887782.0, ans=0.1 2024-09-26 02:43:05,712 INFO [train.py:1198] (3/4) Epoch 49, batch 3250, loss[loss=0.1601, ctc_loss=0.09677, cr_loss=0.3168, over 16957.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1183, cr_loss=0.3348, over 3346831.68 frames. ], batch size: 42, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:43:17,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=887875.3333333334, ans=0.0 2024-09-26 02:43:27,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=887922.0, ans=0.04949747468305833 2024-09-26 02:44:18,280 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.305e+02 1.378e+02 1.462e+02 1.990e+02, threshold=2.757e+02, percent-clipped=0.0 2024-09-26 02:44:26,133 INFO [train.py:1198] (3/4) Epoch 49, batch 3300, loss[loss=0.2057, ctc_loss=0.1325, cr_loss=0.3662, over 16917.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1185, cr_loss=0.3354, over 3351416.84 frames. ], batch size: 58, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:44:56,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=888202.0, ans=0.1 2024-09-26 02:45:29,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=888295.3333333334, ans=0.125 2024-09-26 02:45:33,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=888295.3333333334, ans=0.07 2024-09-26 02:45:44,481 INFO [train.py:1198] (3/4) Epoch 49, batch 3350, loss[loss=0.2274, ctc_loss=0.1479, cr_loss=0.3976, over 15178.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1192, cr_loss=0.3371, over 3356967.65 frames. ], batch size: 89, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:45:44,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=888342.0, ans=0.2 2024-09-26 02:45:49,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=888342.0, ans=0.0 2024-09-26 02:45:55,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=888342.0, ans=0.125 2024-09-26 02:46:09,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=888388.6666666666, ans=0.07 2024-09-26 02:46:23,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=888435.3333333334, ans=0.0 2024-09-26 02:46:35,838 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.79 vs. limit=15.0 2024-09-26 02:46:41,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=888482.0, ans=0.125 2024-09-26 02:46:54,935 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.181e+02 1.320e+02 1.389e+02 1.476e+02 1.760e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-26 02:47:01,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=888575.3333333334, ans=0.125 2024-09-26 02:47:02,748 INFO [train.py:1198] (3/4) Epoch 49, batch 3400, loss[loss=0.2188, ctc_loss=0.1395, cr_loss=0.3963, over 17231.00 frames. ], tot_loss[loss=0.1867, ctc_loss=0.1192, cr_loss=0.3374, over 3363660.81 frames. ], batch size: 55, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:47:03,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=888575.3333333334, ans=0.0 2024-09-26 02:47:07,815 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 02:47:10,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=888575.3333333334, ans=0.1 2024-09-26 02:47:42,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=888668.6666666666, ans=0.5 2024-09-26 02:48:23,487 INFO [train.py:1198] (3/4) Epoch 49, batch 3450, loss[loss=0.1805, ctc_loss=0.1137, cr_loss=0.334, over 17180.00 frames. ], tot_loss[loss=0.1867, ctc_loss=0.1193, cr_loss=0.3371, over 3367033.53 frames. ], batch size: 41, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:48:43,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=888855.3333333334, ans=0.0 2024-09-26 02:49:20,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=888948.6666666666, ans=0.125 2024-09-26 02:49:34,585 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.304e+02 1.377e+02 1.518e+02 2.125e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-26 02:49:40,821 INFO [train.py:1198] (3/4) Epoch 49, batch 3500, loss[loss=0.1418, ctc_loss=0.08732, cr_loss=0.2722, over 16950.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1187, cr_loss=0.3359, over 3369872.64 frames. ], batch size: 42, lr: 2.41e-03, grad_scale: 8.0 2024-09-26 02:49:43,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=889042.0, ans=0.125 2024-09-26 02:49:52,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=889042.0, ans=0.125 2024-09-26 02:49:54,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=889042.0, ans=0.1 2024-09-26 02:50:11,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.42 vs. limit=10.0 2024-09-26 02:50:13,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=889135.3333333334, ans=0.125 2024-09-26 02:50:16,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=12.0 2024-09-26 02:50:19,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=889135.3333333334, ans=0.125 2024-09-26 02:50:33,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=889182.0, ans=0.0 2024-09-26 02:50:37,302 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2024-09-26 02:50:52,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=889228.6666666666, ans=0.125 2024-09-26 02:51:01,307 INFO [train.py:1198] (3/4) Epoch 49, batch 3550, loss[loss=0.1612, ctc_loss=0.1007, cr_loss=0.3025, over 17228.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1184, cr_loss=0.3357, over 3365513.53 frames. ], batch size: 47, lr: 2.41e-03, grad_scale: 8.0 2024-09-26 02:52:06,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2024-09-26 02:52:12,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=889462.0, ans=0.125 2024-09-26 02:52:13,579 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.292e+02 1.382e+02 1.482e+02 3.513e+02, threshold=2.765e+02, percent-clipped=1.0 2024-09-26 02:52:19,986 INFO [train.py:1198] (3/4) Epoch 49, batch 3600, loss[loss=0.2235, ctc_loss=0.1429, cr_loss=0.4031, over 15306.00 frames. ], tot_loss[loss=0.1861, ctc_loss=0.1188, cr_loss=0.3364, over 3362079.68 frames. ], batch size: 89, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:52:53,592 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 02:53:14,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=889648.6666666666, ans=0.1 2024-09-26 02:53:42,739 INFO [train.py:1198] (3/4) Epoch 49, batch 3650, loss[loss=0.1421, ctc_loss=0.0881, cr_loss=0.2702, over 17202.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1186, cr_loss=0.3362, over 3354955.93 frames. ], batch size: 41, lr: 2.41e-03, grad_scale: 8.0 2024-09-26 02:53:54,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-09-26 02:53:57,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=889788.6666666666, ans=0.0 2024-09-26 02:54:01,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=889788.6666666666, ans=0.125 2024-09-26 02:54:08,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=889788.6666666666, ans=0.125 2024-09-26 02:54:15,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=889835.3333333334, ans=0.125 2024-09-26 02:54:20,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=889835.3333333334, ans=0.025 2024-09-26 02:54:35,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=22.5 2024-09-26 02:54:36,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=889882.0, ans=0.125 2024-09-26 02:54:56,316 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.314e+02 1.382e+02 1.463e+02 2.217e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-26 02:55:01,084 INFO [train.py:1198] (3/4) Epoch 49, batch 3700, loss[loss=0.1406, ctc_loss=0.0868, cr_loss=0.2688, over 17269.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1182, cr_loss=0.3352, over 3359939.89 frames. ], batch size: 42, lr: 2.41e-03, grad_scale: 8.0 2024-09-26 02:55:05,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=889975.3333333334, ans=0.125 2024-09-26 02:55:11,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=889975.3333333334, ans=10.0 2024-09-26 02:55:19,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=890022.0, ans=0.2 2024-09-26 02:55:47,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=890115.3333333334, ans=0.0 2024-09-26 02:55:53,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=890115.3333333334, ans=0.125 2024-09-26 02:55:58,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=890115.3333333334, ans=0.2 2024-09-26 02:56:12,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=890162.0, ans=0.125 2024-09-26 02:56:20,584 INFO [train.py:1198] (3/4) Epoch 49, batch 3750, loss[loss=0.1642, ctc_loss=0.1057, cr_loss=0.2925, over 17169.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1189, cr_loss=0.3364, over 3349296.72 frames. ], batch size: 41, lr: 2.41e-03, grad_scale: 8.0 2024-09-26 02:56:31,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=890208.6666666666, ans=0.0 2024-09-26 02:56:44,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=890255.3333333334, ans=0.05 2024-09-26 02:57:15,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-09-26 02:57:35,557 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.349e+02 1.442e+02 1.558e+02 6.975e+02, threshold=2.884e+02, percent-clipped=1.0 2024-09-26 02:57:40,295 INFO [train.py:1198] (3/4) Epoch 49, batch 3800, loss[loss=0.1631, ctc_loss=0.1027, cr_loss=0.3019, over 16334.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.12, cr_loss=0.3382, over 3341032.47 frames. ], batch size: 36, lr: 2.40e-03, grad_scale: 8.0 2024-09-26 02:58:00,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=890488.6666666666, ans=0.1 2024-09-26 02:58:33,181 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-09-26 02:58:34,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=890582.0, ans=0.1 2024-09-26 02:58:35,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=890582.0, ans=0.125 2024-09-26 02:58:54,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=890628.6666666666, ans=0.0 2024-09-26 02:58:58,408 INFO [train.py:1198] (3/4) Epoch 49, batch 3850, loss[loss=0.201, ctc_loss=0.1293, cr_loss=0.3584, over 15001.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1218, cr_loss=0.34, over 3291294.13 frames. ], batch size: 89, lr: 2.40e-03, grad_scale: 8.0 2024-09-26 02:59:02,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=890675.3333333334, ans=0.125 2024-09-26 02:59:08,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=890675.3333333334, ans=0.125 2024-09-26 02:59:11,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=890675.3333333334, ans=0.125 2024-09-26 02:59:16,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=890722.0, ans=0.09899494936611666 2024-09-26 02:59:18,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=890722.0, ans=0.1 2024-09-26 02:59:31,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=890768.6666666666, ans=0.1 2024-09-26 03:00:56,135 INFO [train.py:1198] (3/4) Epoch 50, batch 0, loss[loss=0.159, ctc_loss=0.09791, cr_loss=0.3052, over 16258.00 frames. ], tot_loss[loss=0.159, ctc_loss=0.09791, cr_loss=0.3052, over 16258.00 frames. ], batch size: 36, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:00:56,136 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-26 03:01:06,580 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4153, 4.5877, 3.7765, 4.2319, 4.3296, 3.7457, 3.8877, 4.0105], device='cuda:3') 2024-09-26 03:01:09,004 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.8185, 2.6446, 3.4732, 3.3535], device='cuda:3') 2024-09-26 03:01:12,092 INFO [train.py:1230] (3/4) Epoch 50, validation: loss=0.03452, ctc_loss=0.03452, cr_loss=1.145e-14, over 944034.00 frames. 2024-09-26 03:01:12,092 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-26 03:01:13,603 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.288e+02 1.424e+02 1.584e+02 1.731e+02 2.410e+02, threshold=3.169e+02, percent-clipped=0.0 2024-09-26 03:01:22,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-09-26 03:01:59,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=890983.3333333334, ans=0.125 2024-09-26 03:02:04,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=891030.0, ans=0.1 2024-09-26 03:02:06,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=891030.0, ans=0.125 2024-09-26 03:02:34,240 INFO [train.py:1198] (3/4) Epoch 50, batch 50, loss[loss=0.1947, ctc_loss=0.1284, cr_loss=0.3318, over 14928.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1212, cr_loss=0.3405, over 755142.37 frames. ], batch size: 89, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:02:36,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=22.5 2024-09-26 03:03:00,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=891170.0, ans=0.125 2024-09-26 03:03:33,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=891263.3333333334, ans=0.0 2024-09-26 03:03:57,323 INFO [train.py:1198] (3/4) Epoch 50, batch 100, loss[loss=0.1699, ctc_loss=0.1064, cr_loss=0.3174, over 17222.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1187, cr_loss=0.3353, over 1335449.52 frames. ], batch size: 47, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:03:58,964 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.291e+02 1.362e+02 1.431e+02 2.417e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-26 03:03:59,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=891356.6666666666, ans=0.025 2024-09-26 03:04:24,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=891403.3333333334, ans=0.125 2024-09-26 03:04:25,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=891403.3333333334, ans=15.0 2024-09-26 03:04:26,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=891403.3333333334, ans=0.2 2024-09-26 03:05:22,552 INFO [train.py:1198] (3/4) Epoch 50, batch 150, loss[loss=0.1971, ctc_loss=0.1277, cr_loss=0.3467, over 17004.00 frames. ], tot_loss[loss=0.1841, ctc_loss=0.1176, cr_loss=0.3326, over 1784720.13 frames. ], batch size: 53, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:05:42,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=891636.6666666666, ans=0.125 2024-09-26 03:06:01,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=891683.3333333334, ans=0.025 2024-09-26 03:06:09,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=891730.0, ans=0.0 2024-09-26 03:06:20,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=891730.0, ans=0.0 2024-09-26 03:06:43,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.99 vs. limit=15.0 2024-09-26 03:06:45,777 INFO [train.py:1198] (3/4) Epoch 50, batch 200, loss[loss=0.2057, ctc_loss=0.1354, cr_loss=0.3515, over 17221.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.118, cr_loss=0.3346, over 2138024.75 frames. ], batch size: 55, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:06:47,297 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.198e+02 1.325e+02 1.401e+02 1.516e+02 2.050e+02, threshold=2.802e+02, percent-clipped=0.0 2024-09-26 03:06:49,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=891823.3333333334, ans=0.125 2024-09-26 03:07:09,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=891870.0, ans=0.2 2024-09-26 03:07:16,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=891916.6666666666, ans=0.0 2024-09-26 03:07:22,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=891916.6666666666, ans=0.125 2024-09-26 03:07:32,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=891963.3333333334, ans=0.125 2024-09-26 03:07:37,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=891963.3333333334, ans=0.2 2024-09-26 03:07:49,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=892010.0, ans=0.125 2024-09-26 03:07:56,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=892010.0, ans=0.1 2024-09-26 03:08:00,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=892010.0, ans=0.1 2024-09-26 03:08:05,419 INFO [train.py:1198] (3/4) Epoch 50, batch 250, loss[loss=0.2135, ctc_loss=0.1375, cr_loss=0.3797, over 17251.00 frames. ], tot_loss[loss=0.1861, ctc_loss=0.1188, cr_loss=0.3362, over 2407081.12 frames. ], batch size: 55, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:08:13,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=892056.6666666666, ans=0.025 2024-09-26 03:08:15,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=892056.6666666666, ans=0.2 2024-09-26 03:08:24,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=892103.3333333334, ans=0.125 2024-09-26 03:08:34,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=892103.3333333334, ans=0.125 2024-09-26 03:09:05,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=892196.6666666666, ans=0.125 2024-09-26 03:09:28,276 INFO [train.py:1198] (3/4) Epoch 50, batch 300, loss[loss=0.1897, ctc_loss=0.1212, cr_loss=0.3424, over 17347.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.119, cr_loss=0.3373, over 2618172.52 frames. ], batch size: 48, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:09:29,766 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.289e+02 1.359e+02 1.478e+02 2.731e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-26 03:09:47,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=892336.6666666666, ans=0.125 2024-09-26 03:09:48,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-26 03:09:48,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-09-26 03:09:57,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=892336.6666666666, ans=0.125 2024-09-26 03:10:00,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=892336.6666666666, ans=0.2 2024-09-26 03:10:13,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=892383.3333333334, ans=0.0 2024-09-26 03:10:19,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=892430.0, ans=0.125 2024-09-26 03:10:55,356 INFO [train.py:1198] (3/4) Epoch 50, batch 350, loss[loss=0.185, ctc_loss=0.1168, cr_loss=0.3411, over 17251.00 frames. ], tot_loss[loss=0.1851, ctc_loss=0.118, cr_loss=0.3356, over 2774509.32 frames. ], batch size: 42, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:11:43,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=892616.6666666666, ans=0.125 2024-09-26 03:11:48,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=892663.3333333334, ans=0.2 2024-09-26 03:11:51,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=892663.3333333334, ans=0.125 2024-09-26 03:11:54,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-09-26 03:12:03,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=892710.0, ans=0.2 2024-09-26 03:12:17,701 INFO [train.py:1198] (3/4) Epoch 50, batch 400, loss[loss=0.1846, ctc_loss=0.118, cr_loss=0.3327, over 17303.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1185, cr_loss=0.3355, over 2896554.53 frames. ], batch size: 51, lr: 2.38e-03, grad_scale: 32.0 2024-09-26 03:12:19,212 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.189e+02 1.302e+02 1.364e+02 1.437e+02 1.796e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-26 03:12:35,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=892803.3333333334, ans=0.95 2024-09-26 03:12:38,910 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.01 vs. limit=6.0 2024-09-26 03:12:51,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=892850.0, ans=0.1 2024-09-26 03:13:39,999 INFO [train.py:1198] (3/4) Epoch 50, batch 450, loss[loss=0.169, ctc_loss=0.1058, cr_loss=0.316, over 17016.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1189, cr_loss=0.3368, over 3000178.65 frames. ], batch size: 44, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:13:55,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2024-09-26 03:14:10,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=893083.3333333334, ans=0.125 2024-09-26 03:14:13,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.65 vs. limit=10.0 2024-09-26 03:14:41,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=893130.0, ans=0.0 2024-09-26 03:14:59,057 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.01 vs. limit=12.0 2024-09-26 03:15:02,681 INFO [train.py:1198] (3/4) Epoch 50, batch 500, loss[loss=0.1648, ctc_loss=0.1033, cr_loss=0.3075, over 16306.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1196, cr_loss=0.3377, over 3072759.08 frames. ], batch size: 36, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:15:05,800 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.293e+02 1.377e+02 1.474e+02 1.981e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-26 03:15:35,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=893270.0, ans=0.0 2024-09-26 03:15:49,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=893316.6666666666, ans=0.125 2024-09-26 03:16:24,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2024-09-26 03:16:26,696 INFO [train.py:1198] (3/4) Epoch 50, batch 550, loss[loss=0.1526, ctc_loss=0.09585, cr_loss=0.2838, over 17219.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.118, cr_loss=0.335, over 3137593.42 frames. ], batch size: 47, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:17:04,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=893550.0, ans=0.2 2024-09-26 03:17:06,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=893550.0, ans=0.0 2024-09-26 03:17:08,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=893550.0, ans=0.0 2024-09-26 03:17:35,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=15.0 2024-09-26 03:17:40,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.83 vs. limit=6.0 2024-09-26 03:17:49,350 INFO [train.py:1198] (3/4) Epoch 50, batch 600, loss[loss=0.2017, ctc_loss=0.1289, cr_loss=0.3643, over 16505.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1183, cr_loss=0.3366, over 3195769.69 frames. ], batch size: 66, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:17:52,582 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.327e+02 1.372e+02 1.501e+02 3.825e+02, threshold=2.745e+02, percent-clipped=1.0 2024-09-26 03:18:10,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=893736.6666666666, ans=0.125 2024-09-26 03:18:29,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=893783.3333333334, ans=0.0 2024-09-26 03:18:43,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=893830.0, ans=0.125 2024-09-26 03:18:51,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=893830.0, ans=0.125 2024-09-26 03:19:12,061 INFO [train.py:1198] (3/4) Epoch 50, batch 650, loss[loss=0.1999, ctc_loss=0.1267, cr_loss=0.3663, over 17310.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.118, cr_loss=0.3358, over 3238823.78 frames. ], batch size: 51, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:20:03,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=894063.3333333334, ans=0.125 2024-09-26 03:20:06,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=894063.3333333334, ans=0.2 2024-09-26 03:20:37,745 INFO [train.py:1198] (3/4) Epoch 50, batch 700, loss[loss=0.191, ctc_loss=0.1222, cr_loss=0.344, over 17090.00 frames. ], tot_loss[loss=0.1844, ctc_loss=0.1175, cr_loss=0.3344, over 3265679.20 frames. ], batch size: 49, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:20:40,921 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.326e+02 1.434e+02 1.550e+02 1.872e+02, threshold=2.869e+02, percent-clipped=0.0 2024-09-26 03:20:42,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=894156.6666666666, ans=0.125 2024-09-26 03:20:46,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=894156.6666666666, ans=0.0 2024-09-26 03:21:44,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=894343.3333333334, ans=0.07 2024-09-26 03:21:47,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=894343.3333333334, ans=0.125 2024-09-26 03:22:00,148 INFO [train.py:1198] (3/4) Epoch 50, batch 750, loss[loss=0.199, ctc_loss=0.1278, cr_loss=0.3562, over 17106.00 frames. ], tot_loss[loss=0.1842, ctc_loss=0.1175, cr_loss=0.3339, over 3291058.91 frames. ], batch size: 49, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:22:38,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=894483.3333333334, ans=0.0 2024-09-26 03:22:47,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2024-09-26 03:22:48,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=894530.0, ans=0.125 2024-09-26 03:23:04,250 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=22.5 2024-09-26 03:23:17,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=894576.6666666666, ans=0.0 2024-09-26 03:23:22,024 INFO [train.py:1198] (3/4) Epoch 50, batch 800, loss[loss=0.1708, ctc_loss=0.111, cr_loss=0.2992, over 17257.00 frames. ], tot_loss[loss=0.1845, ctc_loss=0.1177, cr_loss=0.334, over 3314567.28 frames. ], batch size: 42, lr: 2.37e-03, grad_scale: 32.0 2024-09-26 03:23:25,275 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.307e+02 1.380e+02 1.476e+02 1.772e+02, threshold=2.760e+02, percent-clipped=0.0 2024-09-26 03:23:27,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=894623.3333333334, ans=0.125 2024-09-26 03:23:33,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.23 vs. limit=10.0 2024-09-26 03:23:59,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=894716.6666666666, ans=0.2 2024-09-26 03:24:07,316 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-26 03:24:21,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=894763.3333333334, ans=0.2 2024-09-26 03:24:35,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2024-09-26 03:24:42,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=894810.0, ans=0.125 2024-09-26 03:24:45,394 INFO [train.py:1198] (3/4) Epoch 50, batch 850, loss[loss=0.2043, ctc_loss=0.1315, cr_loss=0.3641, over 16708.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1187, cr_loss=0.3356, over 3316743.19 frames. ], batch size: 61, lr: 2.37e-03, grad_scale: 32.0 2024-09-26 03:25:20,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=894950.0, ans=0.025 2024-09-26 03:25:49,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=894996.6666666666, ans=0.0 2024-09-26 03:26:02,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=895043.3333333334, ans=0.125 2024-09-26 03:26:06,536 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2024-09-26 03:26:08,944 INFO [train.py:1198] (3/4) Epoch 50, batch 900, loss[loss=0.1905, ctc_loss=0.1217, cr_loss=0.3442, over 17353.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1186, cr_loss=0.3357, over 3329323.79 frames. ], batch size: 48, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:26:10,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=895090.0, ans=0.125 2024-09-26 03:26:13,764 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.310e+02 1.400e+02 1.511e+02 3.836e+02, threshold=2.800e+02, percent-clipped=1.0 2024-09-26 03:26:15,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=895090.0, ans=0.125 2024-09-26 03:26:59,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=895230.0, ans=0.0 2024-09-26 03:27:01,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=895230.0, ans=0.0 2024-09-26 03:27:22,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=895276.6666666666, ans=0.125 2024-09-26 03:27:31,736 INFO [train.py:1198] (3/4) Epoch 50, batch 950, loss[loss=0.197, ctc_loss=0.1254, cr_loss=0.358, over 17350.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.118, cr_loss=0.335, over 3336294.87 frames. ], batch size: 48, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:27:35,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=895323.3333333334, ans=0.125 2024-09-26 03:27:40,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2024-09-26 03:27:43,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.17 vs. limit=10.0 2024-09-26 03:27:44,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=895323.3333333334, ans=0.125 2024-09-26 03:28:12,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=895416.6666666666, ans=0.2 2024-09-26 03:28:43,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=895510.0, ans=0.1 2024-09-26 03:28:43,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=895510.0, ans=0.025 2024-09-26 03:28:54,019 INFO [train.py:1198] (3/4) Epoch 50, batch 1000, loss[loss=0.2108, ctc_loss=0.1359, cr_loss=0.3745, over 17008.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1181, cr_loss=0.3343, over 3325954.45 frames. ], batch size: 56, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:28:58,809 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.214e+02 1.309e+02 1.391e+02 1.502e+02 1.865e+02, threshold=2.782e+02, percent-clipped=0.0 2024-09-26 03:29:00,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=895556.6666666666, ans=0.1 2024-09-26 03:29:08,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=895603.3333333334, ans=0.0 2024-09-26 03:29:13,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=895603.3333333334, ans=0.125 2024-09-26 03:29:14,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.37 vs. limit=6.0 2024-09-26 03:29:36,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=895650.0, ans=0.0 2024-09-26 03:29:51,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2024-09-26 03:29:57,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=895696.6666666666, ans=0.125 2024-09-26 03:30:00,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=895743.3333333334, ans=0.125 2024-09-26 03:30:05,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=895743.3333333334, ans=0.07 2024-09-26 03:30:10,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=895743.3333333334, ans=0.0 2024-09-26 03:30:12,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=895743.3333333334, ans=0.125 2024-09-26 03:30:19,367 INFO [train.py:1198] (3/4) Epoch 50, batch 1050, loss[loss=0.2124, ctc_loss=0.137, cr_loss=0.3768, over 14868.00 frames. ], tot_loss[loss=0.1842, ctc_loss=0.1176, cr_loss=0.333, over 3328637.87 frames. ], batch size: 88, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:30:26,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=22.5 2024-09-26 03:30:45,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=895836.6666666666, ans=0.1 2024-09-26 03:30:47,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=895836.6666666666, ans=0.125 2024-09-26 03:31:04,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=895883.3333333334, ans=0.125 2024-09-26 03:31:08,291 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.40 vs. limit=22.5 2024-09-26 03:31:12,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=895930.0, ans=0.0 2024-09-26 03:31:24,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=895976.6666666666, ans=0.0 2024-09-26 03:31:29,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=895976.6666666666, ans=0.125 2024-09-26 03:31:43,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=896023.3333333334, ans=15.0 2024-09-26 03:31:44,418 INFO [train.py:1198] (3/4) Epoch 50, batch 1100, loss[loss=0.203, ctc_loss=0.1307, cr_loss=0.3615, over 17067.00 frames. ], tot_loss[loss=0.1834, ctc_loss=0.117, cr_loss=0.3319, over 3335289.40 frames. ], batch size: 56, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:31:48,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=896023.3333333334, ans=0.125 2024-09-26 03:31:49,230 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.312e+02 1.377e+02 1.461e+02 2.179e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-26 03:32:29,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=896116.6666666666, ans=0.07 2024-09-26 03:33:06,913 INFO [train.py:1198] (3/4) Epoch 50, batch 1150, loss[loss=0.201, ctc_loss=0.1292, cr_loss=0.3591, over 16721.00 frames. ], tot_loss[loss=0.1841, ctc_loss=0.1175, cr_loss=0.3328, over 3339731.19 frames. ], batch size: 61, lr: 2.37e-03, grad_scale: 8.0 2024-09-26 03:33:10,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=896256.6666666666, ans=0.1 2024-09-26 03:33:18,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896256.6666666666, ans=0.1 2024-09-26 03:33:32,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=896303.3333333334, ans=0.0 2024-09-26 03:33:55,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=896396.6666666666, ans=0.0 2024-09-26 03:34:08,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=896396.6666666666, ans=0.125 2024-09-26 03:34:16,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=896443.3333333334, ans=0.0 2024-09-26 03:34:29,244 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.52 vs. limit=15.0 2024-09-26 03:34:29,985 INFO [train.py:1198] (3/4) Epoch 50, batch 1200, loss[loss=0.2021, ctc_loss=0.1305, cr_loss=0.3583, over 16909.00 frames. ], tot_loss[loss=0.1837, ctc_loss=0.1172, cr_loss=0.3326, over 3350851.33 frames. ], batch size: 58, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:34:30,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=896490.0, ans=0.125 2024-09-26 03:34:33,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=896490.0, ans=0.125 2024-09-26 03:34:36,175 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.309e+02 1.371e+02 1.493e+02 2.008e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-26 03:35:35,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=896676.6666666666, ans=10.0 2024-09-26 03:35:52,908 INFO [train.py:1198] (3/4) Epoch 50, batch 1250, loss[loss=0.156, ctc_loss=0.09661, cr_loss=0.2972, over 16237.00 frames. ], tot_loss[loss=0.1838, ctc_loss=0.1173, cr_loss=0.3327, over 3348797.30 frames. ], batch size: 36, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:36:12,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=896770.0, ans=0.0 2024-09-26 03:36:26,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=896816.6666666666, ans=0.125 2024-09-26 03:36:53,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=896863.3333333334, ans=0.0 2024-09-26 03:36:59,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=896910.0, ans=0.125 2024-09-26 03:37:08,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.79 vs. limit=6.0 2024-09-26 03:37:15,378 INFO [train.py:1198] (3/4) Epoch 50, batch 1300, loss[loss=0.1898, ctc_loss=0.121, cr_loss=0.3443, over 17234.00 frames. ], tot_loss[loss=0.1838, ctc_loss=0.1172, cr_loss=0.3327, over 3345885.40 frames. ], batch size: 47, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:37:20,607 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 03:37:21,700 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.310e+02 1.392e+02 1.499e+02 2.433e+02, threshold=2.784e+02, percent-clipped=0.0 2024-09-26 03:37:40,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=22.5 2024-09-26 03:37:55,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=897050.0, ans=0.0 2024-09-26 03:38:38,300 INFO [train.py:1198] (3/4) Epoch 50, batch 1350, loss[loss=0.214, ctc_loss=0.1427, cr_loss=0.3564, over 11673.00 frames. ], tot_loss[loss=0.1832, ctc_loss=0.1169, cr_loss=0.3315, over 3349546.00 frames. ], batch size: 123, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:38:40,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=897190.0, ans=0.0 2024-09-26 03:39:04,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.47 vs. limit=15.0 2024-09-26 03:39:10,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=897283.3333333334, ans=0.0 2024-09-26 03:39:10,867 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 03:39:32,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=897330.0, ans=0.125 2024-09-26 03:39:50,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.90 vs. limit=12.0 2024-09-26 03:40:01,285 INFO [train.py:1198] (3/4) Epoch 50, batch 1400, loss[loss=0.1419, ctc_loss=0.08845, cr_loss=0.2674, over 16758.00 frames. ], tot_loss[loss=0.1837, ctc_loss=0.1173, cr_loss=0.332, over 3339567.17 frames. ], batch size: 37, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:40:07,584 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.312e+02 1.402e+02 1.516e+02 2.766e+02, threshold=2.805e+02, percent-clipped=0.0 2024-09-26 03:40:32,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2024-09-26 03:40:33,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=897470.0, ans=0.125 2024-09-26 03:40:34,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=897516.6666666666, ans=0.0 2024-09-26 03:40:36,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.66 vs. limit=15.0 2024-09-26 03:41:18,664 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.20 vs. limit=6.0 2024-09-26 03:41:19,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=897610.0, ans=0.125 2024-09-26 03:41:24,327 INFO [train.py:1198] (3/4) Epoch 50, batch 1450, loss[loss=0.1788, ctc_loss=0.1172, cr_loss=0.3077, over 16742.00 frames. ], tot_loss[loss=0.1831, ctc_loss=0.1167, cr_loss=0.3321, over 3352662.32 frames. ], batch size: 61, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:41:38,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=897656.6666666666, ans=0.1 2024-09-26 03:41:54,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=897703.3333333334, ans=0.125 2024-09-26 03:42:03,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=897750.0, ans=0.0 2024-09-26 03:42:13,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=897796.6666666666, ans=0.125 2024-09-26 03:42:31,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2024-09-26 03:42:37,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=897843.3333333334, ans=0.0 2024-09-26 03:42:41,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.27 vs. limit=10.0 2024-09-26 03:42:42,749 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 03:42:46,930 INFO [train.py:1198] (3/4) Epoch 50, batch 1500, loss[loss=0.2162, ctc_loss=0.1383, cr_loss=0.3895, over 17027.00 frames. ], tot_loss[loss=0.1821, ctc_loss=0.1159, cr_loss=0.3307, over 3349636.68 frames. ], batch size: 44, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:42:51,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-09-26 03:42:53,350 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.299e+02 1.380e+02 1.479e+02 2.541e+02, threshold=2.761e+02, percent-clipped=0.0 2024-09-26 03:42:53,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=897890.0, ans=0.0 2024-09-26 03:43:05,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=897936.6666666666, ans=0.2 2024-09-26 03:43:25,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2024-09-26 03:43:48,312 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2024-09-26 03:43:49,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-26 03:43:57,626 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2024-09-26 03:43:58,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=898076.6666666666, ans=0.125 2024-09-26 03:44:09,795 INFO [train.py:1198] (3/4) Epoch 50, batch 1550, loss[loss=0.2283, ctc_loss=0.1468, cr_loss=0.4076, over 17294.00 frames. ], tot_loss[loss=0.1826, ctc_loss=0.1163, cr_loss=0.3314, over 3354257.02 frames. ], batch size: 49, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:44:35,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=898170.0, ans=0.0 2024-09-26 03:44:39,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2024-09-26 03:45:35,381 INFO [train.py:1198] (3/4) Epoch 50, batch 1600, loss[loss=0.1563, ctc_loss=0.09823, cr_loss=0.2903, over 16680.00 frames. ], tot_loss[loss=0.1837, ctc_loss=0.1172, cr_loss=0.3328, over 3354741.06 frames. ], batch size: 37, lr: 2.37e-03, grad_scale: 32.0 2024-09-26 03:45:41,792 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.345e+02 1.418e+02 1.518e+02 2.144e+02, threshold=2.837e+02, percent-clipped=0.0 2024-09-26 03:46:04,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=22.5 2024-09-26 03:46:10,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=898450.0, ans=0.1 2024-09-26 03:46:18,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=898450.0, ans=0.125 2024-09-26 03:46:26,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=898496.6666666666, ans=0.025 2024-09-26 03:46:26,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=898496.6666666666, ans=0.125 2024-09-26 03:46:57,842 INFO [train.py:1198] (3/4) Epoch 50, batch 1650, loss[loss=0.1901, ctc_loss=0.1198, cr_loss=0.3517, over 17045.00 frames. ], tot_loss[loss=0.1842, ctc_loss=0.1176, cr_loss=0.3331, over 3339174.65 frames. ], batch size: 52, lr: 2.37e-03, grad_scale: 32.0 2024-09-26 03:47:07,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=898590.0, ans=0.07 2024-09-26 03:47:35,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=898683.3333333334, ans=0.2 2024-09-26 03:47:36,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=898683.3333333334, ans=0.025 2024-09-26 03:47:39,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=898683.3333333334, ans=0.125 2024-09-26 03:47:50,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=12.0 2024-09-26 03:48:03,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=22.5 2024-09-26 03:48:06,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=898776.6666666666, ans=0.07 2024-09-26 03:48:20,713 INFO [train.py:1198] (3/4) Epoch 50, batch 1700, loss[loss=0.2131, ctc_loss=0.1338, cr_loss=0.3966, over 16809.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1183, cr_loss=0.3352, over 3342124.22 frames. ], batch size: 61, lr: 2.37e-03, grad_scale: 32.0 2024-09-26 03:48:27,052 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.297e+02 1.382e+02 1.469e+02 2.239e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-26 03:48:32,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=898823.3333333334, ans=0.95 2024-09-26 03:48:46,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=898870.0, ans=0.125 2024-09-26 03:48:46,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=898870.0, ans=0.0 2024-09-26 03:48:56,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=898916.6666666666, ans=0.2 2024-09-26 03:49:07,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=898963.3333333334, ans=0.0 2024-09-26 03:49:27,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=899010.0, ans=0.2 2024-09-26 03:49:36,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=899010.0, ans=0.125 2024-09-26 03:49:38,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=899010.0, ans=0.0 2024-09-26 03:49:42,837 INFO [train.py:1198] (3/4) Epoch 50, batch 1750, loss[loss=0.1904, ctc_loss=0.1239, cr_loss=0.3328, over 17296.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1186, cr_loss=0.3358, over 3347439.89 frames. ], batch size: 49, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:49:54,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=899056.6666666666, ans=0.0 2024-09-26 03:49:55,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=899056.6666666666, ans=0.125 2024-09-26 03:50:07,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=899103.3333333334, ans=0.1 2024-09-26 03:50:41,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=899196.6666666666, ans=0.0 2024-09-26 03:50:51,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=899243.3333333334, ans=0.0 2024-09-26 03:50:56,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=899243.3333333334, ans=0.125 2024-09-26 03:51:05,455 INFO [train.py:1198] (3/4) Epoch 50, batch 1800, loss[loss=0.2162, ctc_loss=0.1439, cr_loss=0.3616, over 14745.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1186, cr_loss=0.3356, over 3344665.91 frames. ], batch size: 88, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:51:13,299 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.324e+02 1.397e+02 1.479e+02 2.541e+02, threshold=2.793e+02, percent-clipped=0.0 2024-09-26 03:51:20,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=899336.6666666666, ans=0.0 2024-09-26 03:51:38,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=899383.3333333334, ans=0.0 2024-09-26 03:52:25,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=899476.6666666666, ans=0.0 2024-09-26 03:52:28,334 INFO [train.py:1198] (3/4) Epoch 50, batch 1850, loss[loss=0.2168, ctc_loss=0.1439, cr_loss=0.3646, over 16940.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1186, cr_loss=0.3357, over 3339514.85 frames. ], batch size: 58, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:52:33,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=899523.3333333334, ans=0.05 2024-09-26 03:52:48,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=899570.0, ans=0.0 2024-09-26 03:53:45,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=899710.0, ans=0.1 2024-09-26 03:53:51,424 INFO [train.py:1198] (3/4) Epoch 50, batch 1900, loss[loss=0.1796, ctc_loss=0.1139, cr_loss=0.3286, over 17256.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1182, cr_loss=0.3353, over 3352230.47 frames. ], batch size: 44, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:53:59,441 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.288e+02 1.384e+02 1.484e+02 2.274e+02, threshold=2.768e+02, percent-clipped=0.0 2024-09-26 03:54:40,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=899896.6666666666, ans=0.95 2024-09-26 03:54:51,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=899896.6666666666, ans=0.1 2024-09-26 03:55:03,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=899943.3333333334, ans=0.1 2024-09-26 03:55:14,171 INFO [train.py:1198] (3/4) Epoch 50, batch 1950, loss[loss=0.1889, ctc_loss=0.1205, cr_loss=0.3419, over 17023.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1183, cr_loss=0.3355, over 3343400.83 frames. ], batch size: 52, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:55:16,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=899990.0, ans=0.125 2024-09-26 03:55:17,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=899990.0, ans=0.0 2024-09-26 03:55:37,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-09-26 03:55:37,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=900036.6666666666, ans=0.2 2024-09-26 03:55:52,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=900083.3333333334, ans=0.1 2024-09-26 03:56:03,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=900130.0, ans=0.0 2024-09-26 03:56:13,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=900130.0, ans=0.5 2024-09-26 03:56:29,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.37 vs. limit=10.0 2024-09-26 03:56:33,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=900176.6666666666, ans=0.125 2024-09-26 03:56:38,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=900223.3333333334, ans=0.07 2024-09-26 03:56:39,351 INFO [train.py:1198] (3/4) Epoch 50, batch 2000, loss[loss=0.1969, ctc_loss=0.1265, cr_loss=0.3517, over 17202.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.119, cr_loss=0.3365, over 3348687.64 frames. ], batch size: 55, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:56:48,774 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.198e+02 1.333e+02 1.399e+02 1.506e+02 2.393e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-26 03:56:50,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=900223.3333333334, ans=0.025 2024-09-26 03:57:01,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=900270.0, ans=0.0 2024-09-26 03:57:09,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=900316.6666666666, ans=0.125 2024-09-26 03:57:27,856 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-09-26 03:57:34,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2024-09-26 03:57:39,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=900363.3333333334, ans=0.125 2024-09-26 03:57:56,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=900410.0, ans=0.1 2024-09-26 03:58:01,484 INFO [train.py:1198] (3/4) Epoch 50, batch 2050, loss[loss=0.1928, ctc_loss=0.1244, cr_loss=0.3421, over 17353.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1191, cr_loss=0.337, over 3359854.79 frames. ], batch size: 48, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:58:03,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=900456.6666666666, ans=0.2 2024-09-26 03:58:04,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=900456.6666666666, ans=0.2 2024-09-26 03:58:04,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=900456.6666666666, ans=0.125 2024-09-26 03:58:46,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=900550.0, ans=0.07 2024-09-26 03:58:55,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=900596.6666666666, ans=0.0 2024-09-26 03:59:16,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=900643.3333333334, ans=0.0 2024-09-26 03:59:24,073 INFO [train.py:1198] (3/4) Epoch 50, batch 2100, loss[loss=0.2013, ctc_loss=0.1298, cr_loss=0.3575, over 15914.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1184, cr_loss=0.3362, over 3361452.16 frames. ], batch size: 74, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:59:33,715 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.174e+02 1.304e+02 1.389e+02 1.531e+02 2.569e+02, threshold=2.777e+02, percent-clipped=0.0 2024-09-26 03:59:42,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=900736.6666666666, ans=0.025 2024-09-26 04:00:04,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=900783.3333333334, ans=0.125 2024-09-26 04:00:12,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=900830.0, ans=0.125 2024-09-26 04:00:28,435 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.14 vs. limit=6.0 2024-09-26 04:00:34,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=900876.6666666666, ans=0.125 2024-09-26 04:00:39,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=900876.6666666666, ans=0.0 2024-09-26 04:00:42,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=900876.6666666666, ans=0.09899494936611666 2024-09-26 04:00:46,960 INFO [train.py:1198] (3/4) Epoch 50, batch 2150, loss[loss=0.1921, ctc_loss=0.1231, cr_loss=0.3452, over 17350.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1189, cr_loss=0.3367, over 3362052.69 frames. ], batch size: 48, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 04:01:00,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2024-09-26 04:01:03,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2024-09-26 04:01:11,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=900970.0, ans=0.125 2024-09-26 04:01:42,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=901063.3333333334, ans=0.125 2024-09-26 04:02:10,066 INFO [train.py:1198] (3/4) Epoch 50, batch 2200, loss[loss=0.174, ctc_loss=0.1101, cr_loss=0.3194, over 17110.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1188, cr_loss=0.3363, over 3354186.40 frames. ], batch size: 40, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 04:02:15,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=901156.6666666666, ans=0.1 2024-09-26 04:02:19,445 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.179e+02 1.317e+02 1.377e+02 1.488e+02 2.594e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-26 04:02:22,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=901156.6666666666, ans=0.125 2024-09-26 04:02:40,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=901250.0, ans=0.025 2024-09-26 04:02:40,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=901250.0, ans=0.2 2024-09-26 04:02:53,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=901250.0, ans=0.0 2024-09-26 04:03:32,033 INFO [train.py:1198] (3/4) Epoch 50, batch 2250, loss[loss=0.1851, ctc_loss=0.1166, cr_loss=0.3428, over 17055.00 frames. ], tot_loss[loss=0.1843, ctc_loss=0.1175, cr_loss=0.3339, over 3361126.51 frames. ], batch size: 56, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 04:03:59,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=901436.6666666666, ans=0.125 2024-09-26 04:04:09,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=901483.3333333334, ans=0.125 2024-09-26 04:04:22,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=901530.0, ans=0.1 2024-09-26 04:04:26,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=901530.0, ans=0.125 2024-09-26 04:04:30,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=901530.0, ans=0.1 2024-09-26 04:04:32,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=901530.0, ans=0.2 2024-09-26 04:04:44,589 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.84 vs. limit=12.0 2024-09-26 04:04:54,694 INFO [train.py:1198] (3/4) Epoch 50, batch 2300, loss[loss=0.2073, ctc_loss=0.1334, cr_loss=0.3693, over 17011.00 frames. ], tot_loss[loss=0.1847, ctc_loss=0.1178, cr_loss=0.3346, over 3364267.82 frames. ], batch size: 51, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 04:04:59,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=901623.3333333334, ans=0.125 2024-09-26 04:05:01,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=901623.3333333334, ans=0.0 2024-09-26 04:05:04,463 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.310e+02 1.388e+02 1.508e+02 2.945e+02, threshold=2.775e+02, percent-clipped=1.0 2024-09-26 04:05:06,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=901623.3333333334, ans=0.125 2024-09-26 04:05:12,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=901670.0, ans=0.125 2024-09-26 04:05:24,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=901670.0, ans=0.125 2024-09-26 04:05:27,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=901716.6666666666, ans=0.2 2024-09-26 04:05:48,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=901763.3333333334, ans=0.025 2024-09-26 04:06:03,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=901810.0, ans=0.125 2024-09-26 04:06:19,857 INFO [train.py:1198] (3/4) Epoch 50, batch 2350, loss[loss=0.179, ctc_loss=0.1131, cr_loss=0.3292, over 17287.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1181, cr_loss=0.3353, over 3345614.37 frames. ], batch size: 46, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 04:06:34,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=901903.3333333334, ans=0.1 2024-09-26 04:07:12,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=901996.6666666666, ans=0.125 2024-09-26 04:07:25,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=12.0 2024-09-26 04:07:30,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=902043.3333333334, ans=0.1 2024-09-26 04:07:39,159 INFO [train.py:1198] (3/4) Epoch 50, batch 2400, loss[loss=0.1802, ctc_loss=0.1145, cr_loss=0.3286, over 15884.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1185, cr_loss=0.3363, over 3347258.80 frames. ], batch size: 35, lr: 2.37e-03, grad_scale: 32.0 2024-09-26 04:07:41,853 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2024-09-26 04:07:51,373 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.310e+02 1.369e+02 1.448e+02 2.051e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-26 04:07:55,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=902090.0, ans=0.125 2024-09-26 04:08:33,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=902230.0, ans=0.125 2024-09-26 04:08:35,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=902230.0, ans=0.125 2024-09-26 04:09:02,440 INFO [train.py:1198] (3/4) Epoch 50, batch 2450, loss[loss=0.1838, ctc_loss=0.1171, cr_loss=0.3337, over 17315.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1184, cr_loss=0.3363, over 3354504.36 frames. ], batch size: 49, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:09:18,565 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.36 vs. limit=10.0 2024-09-26 04:10:02,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=902463.3333333334, ans=0.125 2024-09-26 04:10:04,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=902463.3333333334, ans=0.125 2024-09-26 04:10:07,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=902510.0, ans=0.025 2024-09-26 04:10:27,377 INFO [train.py:1198] (3/4) Epoch 50, batch 2500, loss[loss=0.1929, ctc_loss=0.1216, cr_loss=0.3567, over 16731.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1187, cr_loss=0.336, over 3352374.45 frames. ], batch size: 61, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:10:30,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=902556.6666666666, ans=0.125 2024-09-26 04:10:37,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=902556.6666666666, ans=0.125 2024-09-26 04:10:38,430 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.330e+02 1.408e+02 1.529e+02 2.285e+02, threshold=2.816e+02, percent-clipped=0.0 2024-09-26 04:11:06,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2024-09-26 04:11:11,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=902650.0, ans=0.125 2024-09-26 04:11:25,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=902696.6666666666, ans=0.1 2024-09-26 04:11:43,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=902743.3333333334, ans=0.125 2024-09-26 04:11:43,494 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 04:11:49,380 INFO [train.py:1198] (3/4) Epoch 50, batch 2550, loss[loss=0.1873, ctc_loss=0.1213, cr_loss=0.3299, over 17290.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1185, cr_loss=0.3354, over 3356456.15 frames. ], batch size: 51, lr: 2.36e-03, grad_scale: 8.0 2024-09-26 04:11:52,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=902790.0, ans=0.125 2024-09-26 04:11:59,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=902790.0, ans=0.125 2024-09-26 04:12:22,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=902883.3333333334, ans=0.1 2024-09-26 04:12:44,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=22.5 2024-09-26 04:12:47,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=902930.0, ans=0.0 2024-09-26 04:12:48,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=902930.0, ans=0.0 2024-09-26 04:12:52,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=902930.0, ans=0.1 2024-09-26 04:13:04,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=902976.6666666666, ans=0.1 2024-09-26 04:13:12,205 INFO [train.py:1198] (3/4) Epoch 50, batch 2600, loss[loss=0.1603, ctc_loss=0.09987, cr_loss=0.3022, over 17182.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1181, cr_loss=0.3345, over 3349648.86 frames. ], batch size: 45, lr: 2.36e-03, grad_scale: 8.0 2024-09-26 04:13:17,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=903023.3333333334, ans=0.125 2024-09-26 04:13:25,160 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.312e+02 1.402e+02 1.467e+02 1.641e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-26 04:13:26,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.54 vs. limit=12.0 2024-09-26 04:13:27,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=15.0 2024-09-26 04:13:30,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=903070.0, ans=0.0 2024-09-26 04:13:33,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=903070.0, ans=0.125 2024-09-26 04:13:44,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=903116.6666666666, ans=0.035 2024-09-26 04:13:54,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=903116.6666666666, ans=0.125 2024-09-26 04:14:27,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=903210.0, ans=0.125 2024-09-26 04:14:34,858 INFO [train.py:1198] (3/4) Epoch 50, batch 2650, loss[loss=0.1971, ctc_loss=0.1258, cr_loss=0.3569, over 17060.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1187, cr_loss=0.3355, over 3339153.24 frames. ], batch size: 52, lr: 2.36e-03, grad_scale: 8.0 2024-09-26 04:14:46,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=903256.6666666666, ans=0.125 2024-09-26 04:14:55,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=903303.3333333334, ans=0.1 2024-09-26 04:15:02,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=903303.3333333334, ans=0.2 2024-09-26 04:15:15,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2024-09-26 04:15:51,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=903443.3333333334, ans=0.07 2024-09-26 04:15:57,316 INFO [train.py:1198] (3/4) Epoch 50, batch 2700, loss[loss=0.1497, ctc_loss=0.09366, cr_loss=0.2802, over 17105.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1186, cr_loss=0.3357, over 3343400.29 frames. ], batch size: 40, lr: 2.36e-03, grad_scale: 8.0 2024-09-26 04:16:10,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=15.0 2024-09-26 04:16:12,404 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.330e+02 1.437e+02 1.530e+02 2.387e+02, threshold=2.875e+02, percent-clipped=0.0 2024-09-26 04:16:14,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=903536.6666666666, ans=0.1 2024-09-26 04:16:14,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=903536.6666666666, ans=0.125 2024-09-26 04:16:32,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=12.0 2024-09-26 04:16:41,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=903583.3333333334, ans=0.125 2024-09-26 04:16:43,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=903583.3333333334, ans=0.025 2024-09-26 04:17:14,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=903676.6666666666, ans=0.0 2024-09-26 04:17:14,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903676.6666666666, ans=0.1 2024-09-26 04:17:20,228 INFO [train.py:1198] (3/4) Epoch 50, batch 2750, loss[loss=0.1725, ctc_loss=0.1097, cr_loss=0.3142, over 17303.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1183, cr_loss=0.3352, over 3348018.34 frames. ], batch size: 46, lr: 2.36e-03, grad_scale: 8.0 2024-09-26 04:17:20,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=903723.3333333334, ans=0.0 2024-09-26 04:18:23,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=903863.3333333334, ans=0.125 2024-09-26 04:18:32,691 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.72 vs. limit=22.5 2024-09-26 04:18:42,703 INFO [train.py:1198] (3/4) Epoch 50, batch 2800, loss[loss=0.2123, ctc_loss=0.1363, cr_loss=0.3799, over 16484.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1191, cr_loss=0.3366, over 3338122.59 frames. ], batch size: 66, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:18:56,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903956.6666666666, ans=0.1 2024-09-26 04:18:57,718 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.318e+02 1.422e+02 1.529e+02 1.768e+02, threshold=2.845e+02, percent-clipped=0.0 2024-09-26 04:19:20,772 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 04:19:32,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2024-09-26 04:19:33,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2024-09-26 04:19:36,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=904096.6666666666, ans=0.95 2024-09-26 04:20:04,787 INFO [train.py:1198] (3/4) Epoch 50, batch 2850, loss[loss=0.1927, ctc_loss=0.1208, cr_loss=0.3594, over 17309.00 frames. ], tot_loss[loss=0.1861, ctc_loss=0.1188, cr_loss=0.3365, over 3345322.68 frames. ], batch size: 49, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:20:24,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=904236.6666666666, ans=0.0 2024-09-26 04:20:48,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=904283.3333333334, ans=0.1 2024-09-26 04:21:07,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=904330.0, ans=0.2 2024-09-26 04:21:07,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=904330.0, ans=0.125 2024-09-26 04:21:23,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=904376.6666666666, ans=0.125 2024-09-26 04:21:26,543 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 04:21:29,404 INFO [train.py:1198] (3/4) Epoch 50, batch 2900, loss[loss=0.1879, ctc_loss=0.12, cr_loss=0.3399, over 16875.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1183, cr_loss=0.3347, over 3346095.48 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:21:41,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=15.0 2024-09-26 04:21:42,285 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.318e+02 1.386e+02 1.483e+02 2.505e+02, threshold=2.771e+02, percent-clipped=0.0 2024-09-26 04:21:47,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=904470.0, ans=0.125 2024-09-26 04:22:19,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=904563.3333333334, ans=0.1 2024-09-26 04:22:52,048 INFO [train.py:1198] (3/4) Epoch 50, batch 2950, loss[loss=0.1768, ctc_loss=0.1102, cr_loss=0.333, over 16967.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1183, cr_loss=0.3354, over 3349283.25 frames. ], batch size: 42, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:23:30,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=904750.0, ans=0.125 2024-09-26 04:23:35,123 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.99 vs. limit=15.0 2024-09-26 04:23:35,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.92 vs. limit=10.0 2024-09-26 04:23:39,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=904796.6666666666, ans=0.5 2024-09-26 04:23:46,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-09-26 04:23:54,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=904843.3333333334, ans=0.125 2024-09-26 04:24:14,376 INFO [train.py:1198] (3/4) Epoch 50, batch 3000, loss[loss=0.196, ctc_loss=0.1254, cr_loss=0.353, over 15898.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.118, cr_loss=0.335, over 3355582.75 frames. ], batch size: 74, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:24:14,377 INFO [train.py:1221] (3/4) Computing validation loss 2024-09-26 04:24:30,364 INFO [train.py:1230] (3/4) Epoch 50, validation: loss=0.03495, ctc_loss=0.03495, cr_loss=1.037e-14, over 944034.00 frames. 2024-09-26 04:24:30,364 INFO [train.py:1231] (3/4) Maximum memory allocated so far is 21238MB 2024-09-26 04:24:32,935 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=22.5 2024-09-26 04:24:35,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2024-09-26 04:24:42,934 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.200e+02 1.332e+02 1.410e+02 1.508e+02 3.404e+02, threshold=2.821e+02, percent-clipped=1.0 2024-09-26 04:24:46,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=904936.6666666666, ans=0.0 2024-09-26 04:24:49,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=904936.6666666666, ans=0.025 2024-09-26 04:25:24,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=905030.0, ans=0.125 2024-09-26 04:25:38,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=905076.6666666666, ans=0.125 2024-09-26 04:25:47,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=905123.3333333334, ans=22.5 2024-09-26 04:25:48,680 INFO [train.py:1198] (3/4) Epoch 50, batch 3050, loss[loss=0.19, ctc_loss=0.1207, cr_loss=0.3463, over 16992.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1181, cr_loss=0.3352, over 3363094.25 frames. ], batch size: 44, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:26:19,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=22.5 2024-09-26 04:26:25,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=905216.6666666666, ans=0.125 2024-09-26 04:26:30,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=905216.6666666666, ans=0.2 2024-09-26 04:26:42,864 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 04:26:50,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=905263.3333333334, ans=0.125 2024-09-26 04:26:53,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=905310.0, ans=0.0 2024-09-26 04:26:55,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=22.5 2024-09-26 04:27:09,368 INFO [train.py:1198] (3/4) Epoch 50, batch 3100, loss[loss=0.1989, ctc_loss=0.126, cr_loss=0.3641, over 16469.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1186, cr_loss=0.3357, over 3367194.02 frames. ], batch size: 66, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:27:12,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=905356.6666666666, ans=0.2 2024-09-26 04:27:14,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-09-26 04:27:18,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=905356.6666666666, ans=0.2 2024-09-26 04:27:21,792 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.333e+02 1.400e+02 1.486e+02 1.966e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-26 04:27:22,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=905356.6666666666, ans=0.95 2024-09-26 04:27:25,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=905403.3333333334, ans=0.125 2024-09-26 04:27:31,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=905403.3333333334, ans=0.2 2024-09-26 04:28:11,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=905496.6666666666, ans=0.0 2024-09-26 04:28:13,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-09-26 04:28:26,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.41 vs. limit=10.0 2024-09-26 04:28:29,965 INFO [train.py:1198] (3/4) Epoch 50, batch 3150, loss[loss=0.1701, ctc_loss=0.1087, cr_loss=0.3066, over 17029.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1182, cr_loss=0.3344, over 3369248.30 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:28:41,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=905590.0, ans=0.125 2024-09-26 04:29:00,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=905683.3333333334, ans=0.0 2024-09-26 04:29:14,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=905683.3333333334, ans=0.0 2024-09-26 04:29:29,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=905730.0, ans=0.125 2024-09-26 04:29:39,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-26 04:29:41,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=905776.6666666666, ans=15.0 2024-09-26 04:29:48,816 INFO [train.py:1198] (3/4) Epoch 50, batch 3200, loss[loss=0.199, ctc_loss=0.1276, cr_loss=0.3566, over 17228.00 frames. ], tot_loss[loss=0.1851, ctc_loss=0.1181, cr_loss=0.3347, over 3371614.61 frames. ], batch size: 47, lr: 2.36e-03, grad_scale: 32.0 2024-09-26 04:30:01,215 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.176e+02 1.309e+02 1.397e+02 1.487e+02 2.037e+02, threshold=2.793e+02, percent-clipped=0.0 2024-09-26 04:30:04,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=905870.0, ans=0.0 2024-09-26 04:30:11,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=905870.0, ans=0.125 2024-09-26 04:30:31,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=905916.6666666666, ans=0.125 2024-09-26 04:30:38,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=905963.3333333334, ans=0.2 2024-09-26 04:30:55,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=906010.0, ans=22.5 2024-09-26 04:31:02,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=906010.0, ans=0.0 2024-09-26 04:31:07,478 INFO [train.py:1198] (3/4) Epoch 50, batch 3250, loss[loss=0.1804, ctc_loss=0.1173, cr_loss=0.3154, over 17113.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1183, cr_loss=0.3358, over 3376089.42 frames. ], batch size: 49, lr: 2.36e-03, grad_scale: 32.0 2024-09-26 04:31:56,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=906196.6666666666, ans=0.125 2024-09-26 04:32:11,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=906243.3333333334, ans=0.125 2024-09-26 04:32:15,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=906243.3333333334, ans=0.2 2024-09-26 04:32:20,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=906243.3333333334, ans=0.1 2024-09-26 04:32:27,561 INFO [train.py:1198] (3/4) Epoch 50, batch 3300, loss[loss=0.1867, ctc_loss=0.1188, cr_loss=0.3394, over 17015.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1181, cr_loss=0.3357, over 3372200.78 frames. ], batch size: 44, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:32:41,662 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.281e+02 1.376e+02 1.500e+02 2.072e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-26 04:32:56,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=906336.6666666666, ans=0.0 2024-09-26 04:33:11,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=906383.3333333334, ans=0.125 2024-09-26 04:33:40,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=906476.6666666666, ans=0.125 2024-09-26 04:33:41,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=906476.6666666666, ans=0.025 2024-09-26 04:33:46,258 INFO [train.py:1198] (3/4) Epoch 50, batch 3350, loss[loss=0.1763, ctc_loss=0.1128, cr_loss=0.3176, over 17355.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1184, cr_loss=0.3358, over 3363198.51 frames. ], batch size: 48, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:33:48,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=906523.3333333334, ans=0.1 2024-09-26 04:34:05,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=906570.0, ans=0.2 2024-09-26 04:34:34,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=906663.3333333334, ans=10.0 2024-09-26 04:34:48,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=906663.3333333334, ans=0.1 2024-09-26 04:34:49,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=906710.0, ans=0.0 2024-09-26 04:35:06,878 INFO [train.py:1198] (3/4) Epoch 50, batch 3400, loss[loss=0.2055, ctc_loss=0.1317, cr_loss=0.3688, over 17020.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.118, cr_loss=0.3351, over 3368262.23 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:35:17,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=906756.6666666666, ans=0.125 2024-09-26 04:35:20,638 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.200e+02 1.315e+02 1.394e+02 1.507e+02 2.409e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-26 04:35:33,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=906803.3333333334, ans=0.025 2024-09-26 04:35:49,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=906850.0, ans=0.0 2024-09-26 04:36:15,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=906943.3333333334, ans=0.125 2024-09-26 04:36:24,728 INFO [train.py:1198] (3/4) Epoch 50, batch 3450, loss[loss=0.1958, ctc_loss=0.1229, cr_loss=0.3644, over 17106.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.1178, cr_loss=0.335, over 3370843.36 frames. ], batch size: 49, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:36:25,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=906990.0, ans=0.125 2024-09-26 04:36:32,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=906990.0, ans=0.125 2024-09-26 04:36:34,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=906990.0, ans=0.125 2024-09-26 04:36:56,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=907083.3333333334, ans=10.0 2024-09-26 04:37:00,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-09-26 04:37:08,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=907083.3333333334, ans=0.125 2024-09-26 04:37:32,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-09-26 04:37:38,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=907176.6666666666, ans=0.0 2024-09-26 04:37:38,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=907176.6666666666, ans=0.025 2024-09-26 04:37:43,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=907223.3333333334, ans=0.0 2024-09-26 04:37:44,969 INFO [train.py:1198] (3/4) Epoch 50, batch 3500, loss[loss=0.1952, ctc_loss=0.1287, cr_loss=0.3324, over 17223.00 frames. ], tot_loss[loss=0.184, ctc_loss=0.1172, cr_loss=0.3337, over 3373890.62 frames. ], batch size: 50, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:37:49,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=907223.3333333334, ans=0.1 2024-09-26 04:37:58,718 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.340e+02 1.405e+02 1.556e+02 2.203e+02, threshold=2.811e+02, percent-clipped=0.0 2024-09-26 04:38:30,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=907316.6666666666, ans=0.0 2024-09-26 04:38:40,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2024-09-26 04:38:43,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=907363.3333333334, ans=0.0 2024-09-26 04:38:51,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=907410.0, ans=0.04949747468305833 2024-09-26 04:39:05,114 INFO [train.py:1198] (3/4) Epoch 50, batch 3550, loss[loss=0.1792, ctc_loss=0.1146, cr_loss=0.3229, over 17041.00 frames. ], tot_loss[loss=0.1843, ctc_loss=0.1174, cr_loss=0.3342, over 3373153.53 frames. ], batch size: 51, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:39:16,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=907456.6666666666, ans=0.05 2024-09-26 04:39:17,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=907456.6666666666, ans=0.2 2024-09-26 04:39:39,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=907550.0, ans=0.125 2024-09-26 04:40:07,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=907643.3333333334, ans=0.125 2024-09-26 04:40:13,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=907643.3333333334, ans=0.125 2024-09-26 04:40:22,839 INFO [train.py:1198] (3/4) Epoch 50, batch 3600, loss[loss=0.2032, ctc_loss=0.1304, cr_loss=0.3638, over 17206.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.118, cr_loss=0.3352, over 3372157.28 frames. ], batch size: 47, lr: 2.36e-03, grad_scale: 32.0 2024-09-26 04:40:34,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=907690.0, ans=0.125 2024-09-26 04:40:35,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=907690.0, ans=0.0 2024-09-26 04:40:36,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=12.0 2024-09-26 04:40:36,919 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.290e+02 1.354e+02 1.454e+02 2.078e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-26 04:40:46,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=907736.6666666666, ans=10.0 2024-09-26 04:41:29,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=907876.6666666666, ans=0.0 2024-09-26 04:41:33,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=907876.6666666666, ans=0.125 2024-09-26 04:41:40,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=907876.6666666666, ans=0.0 2024-09-26 04:41:41,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=907923.3333333334, ans=0.0 2024-09-26 04:41:42,943 INFO [train.py:1198] (3/4) Epoch 50, batch 3650, loss[loss=0.1873, ctc_loss=0.1197, cr_loss=0.3376, over 17050.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1181, cr_loss=0.3355, over 3365967.16 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:42:10,040 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.72 vs. limit=15.0 2024-09-26 04:43:01,637 INFO [train.py:1198] (3/4) Epoch 50, batch 3700, loss[loss=0.1597, ctc_loss=0.0984, cr_loss=0.3063, over 16946.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1184, cr_loss=0.3361, over 3364780.54 frames. ], batch size: 42, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:43:11,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=908156.6666666666, ans=0.125 2024-09-26 04:43:12,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=908156.6666666666, ans=0.125 2024-09-26 04:43:17,257 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.236e+02 1.309e+02 1.373e+02 1.463e+02 1.965e+02, threshold=2.746e+02, percent-clipped=0.0 2024-09-26 04:43:52,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=908296.6666666666, ans=0.125 2024-09-26 04:44:07,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=908343.3333333334, ans=0.0 2024-09-26 04:44:21,208 INFO [train.py:1198] (3/4) Epoch 50, batch 3750, loss[loss=0.2057, ctc_loss=0.1333, cr_loss=0.3621, over 17006.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1196, cr_loss=0.338, over 3347594.17 frames. ], batch size: 53, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:44:24,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=908390.0, ans=0.125 2024-09-26 04:44:26,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=908390.0, ans=0.125 2024-09-26 04:44:41,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=908436.6666666666, ans=0.0 2024-09-26 04:44:41,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=908436.6666666666, ans=0.0 2024-09-26 04:45:32,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=908576.6666666666, ans=0.025 2024-09-26 04:45:39,936 INFO [train.py:1198] (3/4) Epoch 50, batch 3800, loss[loss=0.2074, ctc_loss=0.1324, cr_loss=0.3752, over 16550.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1198, cr_loss=0.3377, over 3322960.19 frames. ], batch size: 66, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:45:55,607 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.341e+02 1.416e+02 1.505e+02 2.139e+02, threshold=2.833e+02, percent-clipped=0.0 2024-09-26 04:45:59,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=908670.0, ans=0.125 2024-09-26 04:46:24,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=908716.6666666666, ans=0.0 2024-09-26 04:46:31,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=908763.3333333334, ans=0.125 2024-09-26 04:46:46,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.76 vs. limit=6.0 2024-09-26 04:46:55,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=908810.0, ans=0.2 2024-09-26 04:46:56,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=908856.6666666666, ans=0.0 2024-09-26 04:46:58,258 INFO [train.py:1198] (3/4) Epoch 50, batch 3850, loss[loss=0.1723, ctc_loss=0.1088, cr_loss=0.3175, over 17291.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1201, cr_loss=0.3371, over 3284273.38 frames. ], batch size: 51, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:47:38,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=908950.0, ans=0.025 2024-09-26 04:47:38,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=908950.0, ans=0.2 2024-09-26 04:47:44,214 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-26 04:48:10,089 INFO [train.py:1496] (3/4) Done!